* Some early staging for README updates and pyproject updates for a 0.6.0 release for diskannpy.
* Trying to fix the CI badge to point toward main's latest build
* Updating documentation for pdoc generation
* Documentation updates. Tightened up the API to drop list support (there were entirely too many cases where it wouldn't work, and it's easier to just tell people to convert it themselves)
* Some module reorganization to make pdoc actually display the docstrings for variables re-exported at the top level
* A copy paste happened that shouldn't have.
* Updating the apps to use the new 0.6.0 api
* Addressing PR feedback
* Some of the documentation changes didn't get made in both from_file or the constructor
* While simply creating a unit test to repro Issue #400, I found a number of bugs that I needed to address just to get it to work the way I had intended. This does not yet have what I would consider a comprehensive suite of test coverage for the DynamicMemoryIndex, but we at least do save it with the metadata file, we can load it correctly, and saving *always* consolidate_deletes() prior to save if any item has been marked for deletion prior to save.
* We actually cannot save without compacting before save anyway. Removing the parameter from save() and hardcoding it to True until we can actually support it.
* Addressing some PR comments and readying a 0.5.0.rc5 release
* Identified the appropriate build flags to get a working python build that doesn't rely on -march=native or -mtune=native. We've run benchmarks on multiple computers that indicate the only important flag other than -mavx2 -msse2 -mfma is -funroll-loops. Optimization levels such as -O1, -O2, or -O3 actually makes for less performant code. -Ofast is unavailble for use in Python, as it causes problems with floating point math in Python
* 1.22 was left in a comment despite 1.25 being the value specified
* Python 3.8 is not supported by numpy 1.25, so we're removing it.
* Added utilities to standardize help across cli tools. #370
* Made three option groupings (required/optional/print)
* Moved common parameter descriptions to a common file. #370
* Updated usage statement for search_disk_app #370
* Updated range_search_disk_index to use the new required/optional format. #370
* Updated test apps to use the new help format. #370
* Fixed format issue. #370
* Updated help format for the 'build' apps. #370
* Fixed code formatting. #370
* Added src/*.hpp to the clang format. #370
* Moved header into the headers directory. #370
* Added missing configs. #370
* Removed superflous paths from include. #370
* Added #pragma once. #370
* Type-o fixes. #370
* Fixed capitolization of constant. #370
* Make fail_if_recall description more accurate. #370
* Changed to using set notation. #370
* Better explanations for some options. #370
* Added short explanation of file format. #370
---------
Co-authored-by: Jon McLean <none@example.com>
Co-authored-by: Jonathan McLean <Jonathan.McLean@microsoft.com>
* Removed the logger and verified that the logging capability is the root cause of our consistent segfault errors in python. Perhaps it also will fix any issues in our label test too? I'd like to push it to GH and see.
* Formatting fixes
* Revert "Formatting fixes"
This reverts commit 9042595614.
* Revert "Removed the logger and verified that the logging capability is the root cause of our consistent segfault errors in python. Perhaps it also will fix any issues in our label test too? I'd like to push it to GH and see."
This reverts commit 7561009932.
* The custom logging implementation is causing segfaults in python. We're not sure exactly where, but this is the easiest and quickest way to getting a working python release.
* All the integration tests are failing, and there's a chance the virtual dtor on AbstractDataStore might be the culprit, though I am not sure why. I'm hoping it is so it won't fall on the logging changes.
* Formatting. Again.
Github actions fix: composite action `python-wheel` publishes wheels to the `wheels` artifact. `python-release` workflow then looks for it in the `dist` artifact, which does not exist.
This is a CICD change only.
* small bug fix
* test ubuntu fail
* formatting
* re-triggering unitest
* cause error, remove two character params
* cause error, remove two character params
* unit test fix
* clean up code
* add more accurate error handelling
* fix filter build
* re-trigger test
* try lower recall number
* test witl more value
* revert back to test unit test
* Refactor of diskannpy module code.
* 0.5.0.rc1 for python and enabling the build-python portion of the pr-test process.
* clang-format changes
* In theory this should speed up the python build drastically by only building the wheel for the python version and OS we're attempting to fan out to in our CICD job tree
* Missed a dollar sign
* Copy/pasting left a CICD step name that implied we were running a code formatting check when instead we were building a wheel. This is now fixed.
* In theory, readying the release action too. We won't know if it works until it merges and we cut a release, but at least the paths have been fixed
* Designated initializers just happened to work on linux but shouldn't have as they weren't added until cpp20
* Formatting
* some bug fix when enable the EXEC_EnV_OLS
* avoid unit test failure
* unit test testing
* changed based on gopal's suggestion
* update load_impl(AlignedFileReader &reader)
* change the load_impl to be identical to objectstore
* remvoe blank
* gi# This is a combination of 2 commits.
remove _u, _s typedefs
* added some seed files
* add seed files
* New distance metric hierarchy
* Refactoring changes
* Fixing compile errors in refactored code
* Fixing compile errors
* DiskANN Builds with initial refactoring changes
* Saving changes for Ravi
* More refactoring
* Refactor
* Fixed most of the bugs related to _data
* add seed files
* gi# This is a combination of 2 commits.
remove _u, _s typedefs
* added some seed files
* New distance metric hierarchy
* Refactoring changes
* Fixing compile errors in refactored code
* Fixing compile errors
* DiskANN Builds with initial refactoring changes
* Saving changes for Ravi
* More refactoring
* Refactor
* Fixed most of the bugs related to _data
* Post merge with main
* Refactored version which compiles on Windows
* now compiles on linux
* minor clean-up
* minor bug fix
* minor bug
* clang format fix + build error fix
* clang format fix
* minor changes
* added back the fast_l2 feature
* added back set_start_points in index.cpp
* Version for review
* Incorporating Harsha's comments - 2
* move implementation of abstract data store methods to a cpp file
* clang format
* clang format
* Added slot manager file (empty) and fixed compile errors
* fixed a linux compile error
* clang
* debugging workflow failure
* clang
* more debug
* more debug
* debug for workflow
* remove slot manager
* Removed the #ifdef WINDOWS directive from class definitions
* Refactoring alignment factor into distance hierarchy
* Fixing cosine distance
* Ensuring we call preprocess_query always
* Fixed distance invocations
* fixed cosine bug, clang-formatted
* cleaned up and added comments
* clang-formatted
* more clang-format
* clang-format 3
* remove deleted code in scratch.cpp
* reverted clang to Microsoft
* small change
* Removed slot_manager from this PR
* newline at EOF in_mem_Graph_store.cpp
* rename distance_metric to distance_fn
* resolving PR comments
* minor bug fix for initialization
* creating index_factory
* using index factory to build inmem index
* clang format fix
* minor bug fix
* fixing build error
* replacing mem_store with abstract_mem_store + injecting data_store to Index
* minor fix
* clang format fix
* commenting data_store injection to prevent double invocation and mem leak (for now)
* fixing the build for fiters
* moving abstract index to abstract_index.h
* IndexBuildParamsbuilder to build IndexBuildParams properly with error checking
* fixing build errors
* fixing minor error
* refactoring index search to be simple
* clang format fix
* refactoring search_mem_index to use index factory
* clang fix
* minor fix
* minor fix for build
* optimize for fast l2 restore
* removing comments
* removing comments
* adding templating to IndexFactory (can't avoide it anymore)
* fixing build error
* fixing ubuntu build error
* ubuntu build exception fix
* passing num_pq_bytes
* giving one more shot to config dricen arch with boost::any (type erasure)
* clang fix
* modifying search to use boost::any
* fixing ubuntu build errors/warning
* created indexconfigbuilder and fixed a typo
* fixing error in pq build
* some comments + lazy_delete impl
* bumping to std c++17 & replacing boost::any with std::any
* clang fix
* c++ std 17 for ubuntu
* minor fix
* converting search to batch_search + A vector wrapper using std::any to store vector as a shared ptr
* adding AnyVector to encapsulate vector in std::any + adding basic yaml parser(WIP)
* adding wrapper code for vector and set, checked with Andrija
* fixinh ubuntu build error
* trying to resolve ubuntu build error
* testing test streaming index with IndexFactory
* fixing ubuntu build error
* fixing search for test insert delete consolidate
* refactored test_streaming_scenario
* refactored test_insert_delete_consolidate to use AbstractIndex and Indexfactory
* fixing ubuntu build error
* making build method in abstract index consistent
* some code cleanup + abstract_cpp to add implementation
* remoing coments and code cleanup
* build error fix
* fixing -Wreorder warning
* separating build structs to their header + refactor search and remove batch search
* fixing ubuntu build errors
* resolving segfault error from search_mem_index
* fixing query_result_tag allocation
* minor update
* search fix
* trying to fix windows latest build for dynamic index
* ading temp loggin to debug windows latest build issue
* removing logging for debug
* fixning windows latest build error for dynamix index search
* moving any wrappers to separate file + organizing code
* fixing check error
* updating private vsr naming convention
* minor update
* unravelig search methods in abstract index. Iteraton 1
* minor fix
* unused vars remove
* returning a unique_ptr to Abstract Index from index factory
* adding implementation from abstract_index.h to abstract_index.cpp
* making abstract index api to be more explicit (expriment)
* some code cleanup
* removing detected memory leaks (free up index)
* separtaing enums for data and graph stratagy
* Index ctor(config) now uses injected datastore from IndexFactory
* distance in index population in new config ctor
* resolving some comments from Andrija
* Resolving some restructuring comments by Andrija
* minor fix
* fixing ubuntu build error
* warning fix
* simplified get() in anywrappers
* making index config a unique ptr and owned by IndexFactory
* removing complex if/else calling recursively + added unimplemented TagT to AbsIdx
* renaming get_instance to create_instance
* clang format fix
* removing const_cast from any_wrapper
* fixing andrija's comments
* removing warnings
---------
Co-authored-by: harsha vardhan simhadri <harsha.v.simhadri@gmail.com>
Co-authored-by: Gopal Srinivasa <gopalsr@microsoft.com>
Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>
* fixed a bug with loading medoids for sharded filtered index, and added better caching for filtered index
clang-format
fixed minor cout error
addressed Yiyong's comments, and fixed a bug for finding medoid in sharded+filtered index
Fixed windows compile error (warnings)
Fix inefficiency in constructing reverse label map (#373)
* single loop for reverse label map
* clang formatting
* unnecessary comments removed
* minor
---------
Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>
clang-formatted
* minor cleanup
* clang-format
---------
Co-authored-by: ravishankar <rakri@microsoft.com>
* Add unit test project based on boost_unit_test_framework
* Add another dockerfile for developers
* update path
---------
Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>
* Adding cosine distance - I didn't know we had that as a first level distance metric
* Making our mkl and iomp linking game more rigorously defined for the ubuntus
* Included latest as a path fragment twice on accident
* libmkl_def.so is named something different when installed via the intel oneapi installer
* Making a number of changes to homogenize our api (same parameters, minimize parameters as much as possible, etc)
* Stashing this and going to work on the CICD stuff, it's driving me nuts
* Fairly happy with the Python API now. Documentation needs another pass, the @overloads in the .pyi files need to be addressed, and documentation checked again. The apps folder also needs updating to use fire instead of argparse
* Updated build to not use tcmalloc for pybind, as well as fixed the pyproject.toml so that cibuildwheel can actually successfully build our project.
* Making a change to in-mem-static for the new api and also adjusting the comment in in-mem-dynamic a bit, though... I probably shouldn't have
* Upload data and binary files to artifact so that we could debug issue locally when the workflows fails
* use different artifact name for different scenarios
---------
Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>
* UNV Search Fix for Memory
* two places to update
* clang format
* unify find_common_filters function
* fix comments
- only return size of common filters from the find_common_filters function
* dummy comments
* clang format
* Reduce repetitive calls
* changing name and return type of function
* Refactored the build processes. Broke things into components as much as
possible. We have standalone actions for the build processes to make
sure they are consistent across push or PR builds, a format-check that
doesn't rely on cmake to be there to work, and centralized our
randomized data generation into a single action that can be called in
each section.
We now are reusing as many of the steps as we can without copy/pasting,
which should ensure we're not making mistakes.
* Fixing the dynamic tests, the paths to the data were wrong
---------
Co-authored-by: yashpatel007 <patelyash1311@gmail.com>
* Use int64 for counter to fix windows compilation error
* Fix windows python bindings by adding install_lib command to move windows build output into python package
* Update to use Path instead of os
* Change batch_insert num_inserts signature to signed type for OpenMP compatibility
* Update num_inserts to int32_t per PR request
---------
Co-authored-by: Nick Caurvina <nicaurvi@microsoft.com>
* Force error on warnings and add casts to test directory
* Use size_t for index of point IDs
* Refactor iterator and conditions for printing labels
---------
Co-authored-by: David Kaczynski <dkaczynski@microsoft.com>
* added timer and QPS to static search app
* search only option to static index
* search only option to static index
* exposing metric in static function
---------
Co-authored-by: Dax Pryce <daxpryce@microsoft.com>
* Adding some diagnostics to a pr build in an attempt to see what is going on with our systems prior to running our streaming/incremental tests
* fix cast error and add some status prints to in-mem-dynamic app
* Adding unit tests for both memory and disk index builder methods
* After the refactor and polish of the API was left half done, I also left half a jillion bugs in the library. At least I'm confident that build_memory_index and StaticMemoryIndex work in some cases, whereas before they barely were getting off the ground
* Sanity checks of static index (not comprehensive coverage), and tombstone file for test_dynamic_memory_index
* Argument range checks of some of the static memory index values.
* fixes for dynamic index in python interface (#334)
* create separate default number of frozen points for dynamic indices
* consolidate works
* remove superfluous param from dynamic index
* remove superfluous param from dynamic index
* batch insert and args modification to apps
* batch insert and args modification to apps
* typo
* Committing the updated unit tests. At least the initial sanity checks of StaticMemory are done
* Fixing an error in the static memory index ctor
* Formatting python with black
* Have to disable initial load with DynamicMemoryIndex, as there is no way to build a memory index with an associated tags file yet, making it impossible to load an index without tags
* Working on unit tests and need to pull harsha's changes
* I think I aligned this such that we can execute it via command line with the right behaviors
* Providing rest of parameters build_memory_index requires
* For some reason argparse is allowing a bunch of blank space to come in on arguments and they need stripped. It also needs to be using the right types.
* Recall test now works
* More unit tests for dynamic memory index
* Adding different range check for alpha, as the values are only really that realistic between 1 and 2. Below 1 is an error, and above 2 we'll probably make a warning going forward
* Storing this while I cut a new branch and walk back some work for a future branch
* Undoing the auto load of the dynamic index until I can debug why my tag vector files cause an error in diskann
* Updating the documentation for the python bindings. It's a lot closer than it was.
* Fixing a unit test
* add timers to dyanmic apps (#337)
* add timers to dyanmic apps
* clang format
* np.uintc vs. int for dtype of tags
* fixes to types in dynamic app
* cast tags to np.uintc array
* more timers
* added example code in comments in app file
* round elapsed
* fix typo
* fix typo
---------
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>
Co-authored-by: harsha vardhan simhadri <harsha.v.simhadri@gmail.com>
* Disabling Python builds
debian stretch no longer seems to have valid apt repos - or at least not ones that we can access - which means our cibuildwheel is failing.
Create a virtual data store base class and a derived in-mem store class. In-mem index now uses the data store class.
---------
Co-authored-by: Gopal Srinivasa <gopalsr@microsoft.com>
Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: yashpatel007 <patelyash1311@gmail.com>
Fix performance gap between in-mem and SSD based graph built by passing an appropriate number of threads.
---------
Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>
* The first step in the python-api-enhancements branch. We need to fix a problem with the Parameters class with a double free or segfault on deletion.
* Removing the parameters class in favor of the IndexRead and IndexWrite parameters classes.
* API changes and python packaging changes for linux. It's almost ready for PR, but definitely ready for push.
* Suppressing the CIBuildWheel step on windows
* added in-mem static and dynamic index class to python bindings (#301)
* Advancing our version number to 0.5.0
* Some more updates as per harsha's comments on PR #300. The diskann_bindings.cpp still need some more tlc and the wrapper needs to make use of it, and we also want to include some examples, but this is a good place to bring into main and then do further enhancements
---------
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>
* updated dockerfile
* add parallel build flag to dockerfile
* Adds CI jobs to build our docker container (#302)
* Adding a step that at least builds the docker container. I'm not yet sure how I want to actually integrate tests within the container, but at the least we should verify it builds
* docker build needs a path. i honestly thought it defaulted to the CWD
---------
Co-authored-by: Dax Pryce <daxpryce@microsoft.com>