* Scripts to parse DiskANN SSD index
* Removed unnecessary code check and fixed argparse for bool
* Added support for multi-sector nodes in the disk index
---------
Co-authored-by: Gopal Srinivasa <gopalsr@microsoft.com>
* Add simplified functions for product quantization
* Fixing formatting errors
* Fixing clang-format issue
* Fixing another set of clang-format issues
---------
Co-authored-by: Michael Popov (from Dev Box) <mipopo@microsoft.com>
* Version bump 0.7.0rc2->0.7.0
Preparing diskannpy for 0.7.0 release (filter support, static memory indices only)
* Update pyproject.toml
the GPG key from (presumably) 2019 is no longer valid
* Update pyproject.toml
* Update python-release.yml
By default, GITHUB_TOKEN no longer has write permissions - you have to explicitly ask for it in the specific job that needs it.
We use write permissions to update the Github release action that updates the published build artifacts with the results of the release flow.
* compiles, but need to verify
* fixed windows compiler warning
* minor typo
* added cosine unit test with unnormalized data
* minor typo in user prompt cosine/l2
* cosine was already supported in groundtruth, edited the message to say so
* clang-format
---------
Co-authored-by: rakri <rakri@microsoft.com>
* add 16 bytes tag type
* clean up code
* format doc
* fix compile issue
* fix compile issue
* revert change
* format doc
* separate static search and streaming search
* clean up code
* resolve comment
* format doc
* fix test
* resolve comment
The program builds the streaming index after two optional steps: 1) skipping S points from the input file and 2) batch building of initial index using B points from the input file.
After these two steps, the offset to the input file should be S + B, but the current code first sets it to S in line 163 then overwrites it to B in line 249, instead of adding B to the offset. The tool which `test_insert_deletes_consolidate` was based on was using `+=` in the modified line.
* Added PQ distance hierarchy
Changes to CMakelists
PQDataStore version that builds correctly
Clang-format
* Fixing compile issues after rebase to main
* minor renaming functions
* fixed small bug post rebasing with index factory
* Changes to index factory to support PQDataStore
* Merged graph_store and pq_data_store
* Implementing preprocessing for inmemdatastore
* Incorporating code review comments
* minor bugfix for PQ data allocation
* clang-formatted
* Incorporating CR comments
* Fixing compile error
* minor bug fix + clang-format
* Update pq.h
* Fixing warnings about struct/class incompatibility
---------
Co-authored-by: Gopal Srinivasa <gopalsr@microsoft.com>
Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: gopalrs <33950290+gopalrs@users.noreply.github.com>
* add fix for memory leak
* cmake change for enable tcmalloc
* add hot fix for cmake for boost and tcmalloc
* fix indentation
* identitation
* change camke set on after cmake_minimum_required
* unset tcmalloc for PYBIND
* unset envirvariable beforehead
* set off
* exlucde the compile def for pybind
* disable for pybind
* Fixing the same bug I had in static disk index inside of static memory index as well.
* Unit tests and a better understanding of why the unit tests were successful despite this bug
* Halfway approach to the new indexfactory, but it doesn't have the same featureset as the old way. Committing this for posterity but reverting my changes ultimately
* Revert "Halfway approach to the new indexfactory, but it doesn't have the same featureset as the old way. Committing this for posterity but reverting my changes ultimately"
This reverts commit 03dccb5994.
* Adding filtered search. API is going to change still.
* Further enhancements to the new filter capability in the static memory index.
* Ran automatic formatting
* Fixing my logic and ensuring the unit tests pass.
* Setting this up as a rc build first
* list[list[Hashable]] -> list[list[str]]
* Adding halfway to a solution where we query for more items than exist in the filter set. We need to replicate this behavior across all indices though - dynamic, static disk and memory w/o filters, etc
* Removing the import of Hashable too
* read from MemoryMappedFile when EXEC_ENV_OLS is defined
* fix is_open/close which stringstream does not have
* fix formating to comply with clang
* fix labels.yml: create tmp directory before search_diskk_index is run
* fix to reset stream after reads
* Add bool param for building a graph of labeled data
* Add arguments for building labeled index
* Pass arguments for labeled index
* Light renaming
* Handle labels in insert_point
* Fix missing semicolon
* Add initial label handling logic
* Use unlabeled algo for uniquely labeled point
* Ignore frozen points when checking labels
* Fix missing newline
* Move label-specific logic to threadsafe zone
* Check for frozen points when assert num points and num labeled points
* Fix file name concatenation for label metadata
* inmem_graph_store initial impl
* Use Lbuild to append to pruned_list during filter build
* Add label counts for deleting from streaming index
* Fix typo
* Fix conditions for testing
* Add medoid search to support deleting label medoids from graph
* resolvig error with bfs_medoid_search()
* trying to create 2 pruned_lists and combine them
* Clear pool between calls to search_for_point_and_prune. Fix integer math
* Update pruned_list algo for link method
* making fz_points to be medoids for labels encountered
* repositioning medoids as well because they are fz points when compacting data
* removing unrequired method
* rebasing from main
* adding tests in yml workflow for dynamic index with labels
* quick fix
* removing combining of unfiltered + filtered list for now
* trying to resolve disk search poor performance
* incleasing L size while searching disk index
* minor roolback
* updating dynamic-label to not use tag file while computing GT
* altering some test search L values
* adding unfiltered search for filtered batch build index
* adding compute gt for zipf dist labels in labsls wowrkflow
* searching filtered streaming index with popular label for now
* reposition fz points as medoids for filtered dynamic build
* minor renaming vars
* seoparate functio for insert opoint with labels and without labels
* clang error fix
* barebones of in mem graph store
* refactoring index to use index factory
* clang format fix
* window build fix
* making enum to enum class (c++ 11 style) for scope resolution with same enum values
* cleaning up API for GraphSore
* resolving comments
* clang error fix
* adding some comments
* moving _nd back to index class
* removing funcrion reposition medoidds its not required, incorporated into reposition_points
* altering -L (32->5) and -R (16->32) whhile building filterted disk index to work well with modified connections in algo
* updating docs -> dynamic_index.md to have info on how to build and search filtered dynamic index
* updating docs
* updateing _pts_to_labels when repositioning fz_points
* error fix
* clang fix
* making sure _pts_to_labels are not empty
* fixing dynamic-label build error
* code improvements
* adding logic for test_ins_del_consolidate to support filtered index
* resolving PR comments
* error fix
* error fix for dynamic
* now test insert delete consolidate support building filters
* lowering recal in case of test insert delete consolidte
* resolving PR comments
* removing _num_frozen_point from graph store
* minor fix
* moving _start back to main + minor update in graph store api to support that
* adding a lock before detect_common_filter + minor naming improvement
* adding requested changes from Gopal
* removing reservations
* resolving namespace resolution for defaults after build failure
* minor update
* minor update
* speeding up location update logic while repositioning
* updated with reserving mem for graph neighbours upfront
* build error fix
* minor update in assert
* initial commit
* updating python bindings to use new ctor
* python binding error fix
* error fix
* reverting some changes -> experiment
* removing redundnt code from native index
* python build error fix
* tyring to resolve python build error
* attempt at python build fix
* adding IndexSearchParams
* setting search threads to non zero
* minor check removed
* eperiment 3-> making distance fully owned by data_store
* exp 3 clang fix
* exp 4
* making distance as unique_ptr
* trying to fix build
* finally fixing problem
* some minor fix
* adding dll export to index_factory static function
* adding dll export for static fn in index_factory
* code cleanup
* resolving errors after merge
* resolving build errors
* fixing build error for stitched index
* resolving build errors
* removing max_observed_degree set()
* removing comments + typo fix
* replacing add_neighbour with set_neighbours where we can
* error fix
* minor fix
* fixing error introduced while rebasing
* fixing error for dynamic filtered index
* resolving dynamic build deadlick error
* resolving error with test_insert_del_consolidate for dynamic filter build
* minor code cleanup
* refactoring fz_pts and filter_index to be property of IndexConfig and hence Index
* removing write_params from build()
* removing write_params from buidl and taking it upfront in Index Ctor
* minor fix
* renaming build_params to filter params
* fixing errors on auto merge
* auto decide universal_label experiment
* resolving bug with universal lable
* resolving dynamic labels error, if there are unused fz points
* exposing set_universal_label() through abstract index
* minor update: sanity check
* minor update to search
* including tag file while computing GT
* generating compacted label file and using it in generate GT
* minor fix
* resolving New PR comments (minor typo fixes)
* renaming _pts_to_labels to _tag_to_labels + adding a warning for consolidate deletes and quality of index
* minor name chnage + code cleanup
* clang format fix
* adding locks for filter data_structures
* avoiding deadock
* universal label defination update
* reverting locks on _location_to_labels as its causing problems with large dataset
* adding locks for _label_to_medoid_id
* Update dynamic_index.md
* Update dynamic-labels.yml
* renaming some variables
---------
Co-authored-by: David Kaczynski <dkaczynski@microsoft.com>
Co-authored-by: yashpatel007 <patelyash1311@gmail.com>
Co-authored-by: Yash Patel <47032340+yashpatel007@users.noreply.github.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>
* Fixes#432, bug in using openmp with gcc and omp_get_num_threads() only reporting the number of threads collaborating on the current code region not available overall. I made this error and transitioned us from omp_get_num_procs() about 5 or 6 months ago and only with bug #432 did I really get to see how problematic my naive expectations were.
* Removed cosine distance metric from disk index until we can properly fix it in pqflashindex. Documented what distance metrics can be used with what vector dtypes in tables in the documentation.
* made changes to clean up filter number conversion, and fixed bug with universal filter search
* minor typecast fix
---------
Co-authored-by: rakri <rakri@microsoft.com>
* inmem_graph_store initial impl
* barebones of in mem graph store
* refactoring index to use index factory
* clang format fix
* making enum to enum class (c++ 11 style) for scope resolution with same enum values
* cleaning up API for GraphSore
* moving _nd back to index class
* resolving PR comments
* error fix
* error fix for dynamic
* resolving PR comments
* removing _num_frozen_point from graph store
* minor fix
* moving _start back to main + minor update in graph store api to support that
* adding requested changes from Gopal
* removing reservations
* resolving namespace resolution for defaults after build failure
* minor update
* minor update
* speeding up location update logic while repositioning
* updated with reserving mem for graph neighbours upfront
* build error fix
* minor update in assert
* initial commit
* updating python bindings to use new ctor
* python binding error fix
* error fix
* reverting some changes -> experiment
* removing redundnt code from native index
* python build error fix
* tyring to resolve python build error
* attempt at python build fix
* adding IndexSearchParams
* setting search threads to non zero
* minor check removed
* eperiment 3-> making distance fully owned by data_store
* exp 3 clang fix
* exp 4
* making distance as unique_ptr
* trying to fix build
* finally fixing problem
* some minor fix
* adding dll export to index_factory static function
* adding dll export for static fn in index_factory
* code cleanup
* resolving errors after merge
* resolving build errors
* fixing build error for stitched index
* resolving build errors
* removing max_observed_degree set()
* removing comments + typo fix
* replacing add_neighbour with set_neighbours where we can
* error fix
* move read_nodes to public, add get_pq_vector and get_num_points
* clang-format
* Match new private var naming convention
* more private (_) fixes
* VID->vid
* VID->vid cpp
* make sector node an inline function
* convert offset_node macro to inline method
* rename member vars to start with underscore in pq_flash_index.h
* added support in create_disk_index
* add read sector util
* load_cache_list now uses read_blocks util
* allow nullptr for read_nodes
* BFS cache generation uses util
* add num_sectors info to cache_beam_Search
* add CI test for 1020,1024,1536D float and 4096D int8 rand vector on disk
* Have a working dockerfile to run perf tests and report the times they take. We can also capture stdout/stderr with it for further information, especially for tools that report internal latencies.
* Slight changes to the perf test script, a perf.yml for the github action
* Added PDoc workflow
* Added documentation to the push-test workflow
* Added diskannpy to the env for pdoc to use
* Initial commit of doc publish workflow
* Tried heredoc to get python version
* Tried another way of getting the version
* Tried another way of getting the version
* Moved to docs/python path
* Removing the test harness
* Add dependencies per wheel
* Moved dependency tree to the 'push' file so it runs on push
* Added label name to the dependency file
* Trying maxtrix.os to get the os and version
* Moved doc generation from push-test to python-release. Will add 'dev' doc generation to push-test
* Publish latest/version docs only on release. Publish docs for every dev build on main.
* Install the local-file version of the library
* Disable branch check so I can test the install
* Use python build to build a wheel for use in documentation
* Tried changing to python instead of python3
* Added checkout depth in order to get boost
* Use the python build action to create wheel for documentation
* Revert "Use the python build action to create wheel for documentation"
This reverts commit d900c1d42c.
* Added linux environment setup
* Made only publish dev when on main and added comments
---------
Co-authored-by: Jonathan McLean <Jonathan.McLean@microsoft.com>