Граф коммитов

269 Коммитов

Автор SHA1 Сообщение Дата
gopalrs a2a0f6942a
Gopalsr/index dumper (#541)
* Scripts to parse DiskANN SSD index

* Removed unnecessary code check and fixed argparse for bool

* Added support for multi-sector nodes in the disk index

---------

Co-authored-by: Gopal Srinivasa <gopalsr@microsoft.com>
2024-05-22 12:18:16 +05:30
dependabot[bot] 8aedb3a3d4
Bump openssl from 0.10.55 to 0.10.60 in /rust (#496)
Bumps [openssl](https://github.com/sfackler/rust-openssl) from 0.10.55 to 0.10.60.
- [Release notes](https://github.com/sfackler/rust-openssl/releases)
- [Commits](https://github.com/sfackler/rust-openssl/compare/openssl-v0.10.55...openssl-v0.10.60)

---
updated-dependencies:
- dependency-name: openssl
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-19 14:04:06 +05:30
luyuncheng 4e0eb882b6
Fix PQScratch memory leak (#522)
* fix memory leak

* FIXED clang-format error

* FIXED SSDQueryScratch Destroy OOM
2024-04-02 18:44:33 +05:30
Huisheng Liu 9e87637634
wait on completeCount if callback is used (#532) 2024-03-26 13:55:27 +05:30
Huisheng Liu 2795f8524a
replace callback driven wait with new Wait() method (#526) 2024-03-07 13:07:50 -08:00
litan1 61846c08c2
Create in memory data store/graph store with at least max_points as 1 (#523)
* create in memory data store/graph store with at least max_points as 1

* fix code formatting
2024-03-06 00:17:19 +05:30
Michael Popov a25ee6f211
Add simplified functions for product quantization (#514)
* Add simplified functions for product quantization

* Fixing formatting errors

* Fixing clang-format issue

* Fixing another set of clang-format issues

---------

Co-authored-by: Michael Popov (from Dev Box) <mipopo@microsoft.com>
2024-02-27 13:58:05 -08:00
Huisheng Liu 340bc58b5a
add wait() method to AlignedFileReader (#518) 2024-02-26 11:49:35 -08:00
NingyuanChen 24581a4979
Bug fix for dlvs (#509)
* Fix small bugs for DLVS path.

* Easier for user to use.

---------

Co-authored-by: REDMOND\ninchen <ninchen@microsoft.com>
2024-02-07 14:59:20 -08:00
Dax Pryce 9500d5a787
Update push-test.yml (#512) 2024-02-07 14:57:52 -08:00
Dax Pryce 6e4569fb3d
Allow documentation to be published to our gh-pages branch (#511) 2024-02-06 11:50:42 -08:00
Dax Pryce df225d32da
Version bump 0.7.0rc2->0.7.0 (#510)
* Version bump 0.7.0rc2->0.7.0

Preparing diskannpy for 0.7.0 release (filter support, static memory indices only)

* Update pyproject.toml

the GPG key from (presumably) 2019 is no longer valid

* Update pyproject.toml

* Update python-release.yml

By default, GITHUB_TOKEN no longer has write permissions - you have to explicitly ask for it in the specific job that needs it.

We use write permissions to update the Github release action that updates the published build artifacts with the results of the release flow.
2024-02-06 11:13:43 -08:00
rakri 13df0cf7c7
Rakri/cosine bug fix (#450)
* compiles, but need to verify

* fixed windows compiler warning

* minor typo

* added cosine unit test with unnormalized data

* minor typo in user prompt cosine/l2

* cosine was already supported in groundtruth, edited the message to say so

* clang-format

---------

Co-authored-by: rakri <rakri@microsoft.com>
2024-02-06 18:09:30 +05:30
Jerry Gao 58de98dc73
add16bytes tag type (#506)
* add 16 bytes tag type

* clean up code

* format doc

* fix compile issue

* fix compile issue

* revert change

* format doc

* separate static search and streaming search 

* clean up code

* resolve comment

* format doc

* fix test

* resolve comment
2024-02-06 15:54:05 +05:30
Andrija Antonijevic 5cf0360d7e
Fix calculation of current_point_offset in test_insert_consolidate_deletes (#501)
The program builds the streaming index after two optional steps: 1) skipping S points from the input file and 2) batch building of initial index using B points from the input file.

After these two steps, the offset to the input file should be S + B, but the current code first sets it to S in line 163 then overwrites it to B in line 249, instead of adding B to the offset. The tool which `test_insert_deletes_consolidate` was based on was using `+=` in the modified line.
2024-01-26 09:45:18 +05:30
dependabot[bot] 38cf26d88e
Bump zerocopy from 0.6.1 to 0.6.6 in /rust (#499)
Bumps [zerocopy](https://github.com/google/zerocopy) from 0.6.1 to 0.6.6.
- [Release notes](https://github.com/google/zerocopy/releases)
- [Changelog](https://github.com/google/zerocopy/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google/zerocopy/commits)

---
updated-dependencies:
- dependency-name: zerocopy
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-18 00:49:50 -06:00
rakri 57440609e9
Adding a new PQ Distance Metric and PQ Data Store (#384)
* Added PQ distance hierarchy

Changes to CMakelists

PQDataStore version that builds correctly

Clang-format

* Fixing compile issues after rebase to main

* minor renaming functions

* fixed small bug post rebasing with index factory

* Changes to index factory to support PQDataStore

* Merged graph_store and pq_data_store

* Implementing preprocessing for inmemdatastore

* Incorporating code review comments

* minor bugfix for PQ data allocation

* clang-formatted

* Incorporating CR comments

* Fixing compile error

* minor bug fix + clang-format

* Update pq.h

* Fixing warnings about struct/class incompatibility

---------

Co-authored-by: Gopal Srinivasa <gopalsr@microsoft.com>
Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: gopalrs <33950290+gopalrs@users.noreply.github.com>
2023-12-05 13:18:27 +05:30
jinwei14 03abc71205
Use TCMalloc to fix system memory leak (#494)
* add fix for memory leak

* cmake change for enable tcmalloc

* add hot fix for cmake for boost and tcmalloc

* fix indentation

* identitation

* change camke set on after cmake_minimum_required

* unset tcmalloc for PYBIND

* unset envirvariable beforehead

* set off

* exlucde the compile def for pybind

* disable for pybind
2023-12-01 14:48:45 +05:30
Siddharth Gollapudi 87990dacfb
Address race condition in `iterate_to_fixed_point` (#478)
Co-authored-by: Siddharth Gollapudi <t-gollapudis@microsoft.com>
2023-11-23 13:33:47 -08:00
Shawn Zhong b2a595ceea
Handle io_setup error properly (#465) 2023-11-20 15:28:02 -08:00
Dax Pryce 35f8cf7546
Fixing index_prefix_path bug in python for StaticMemoryIndex (#491)
* Fixing the same bug I had in static disk index inside of static memory index as well.

* Unit tests and a better understanding of why the unit tests were successful despite this bug
2023-11-09 15:05:59 -08:00
Dax Pryce 4a57e8931b
Adding Filtered Index support to Python bindings (#482)
* Halfway approach to the new indexfactory, but it doesn't have the same featureset as the old way. Committing this for posterity but reverting my changes ultimately

* Revert "Halfway approach to the new indexfactory, but it doesn't have the same featureset as the old way. Committing this for posterity but reverting my changes ultimately"

This reverts commit 03dccb5994.

* Adding filtered search. API is going to change still.

* Further enhancements to the new filter capability in the static memory index.

* Ran automatic formatting

* Fixing my logic and ensuring the unit tests pass.

* Setting this up as a rc build first

* list[list[Hashable]] -> list[list[str]]

* Adding halfway to a solution where we query for more items than exist in the filter set. We need to replicate this behavior across all indices though - dynamic, static disk and memory w/o filters, etc

* Removing the import of Hashable too
2023-11-07 09:28:04 -08:00
Xiangyu Wang 179927ed35
correct index_path_prefix in __init__ function of static disk index (#483) 2023-11-06 09:45:59 -08:00
dependabot[bot] 3d58cebf98
Bump rustix from 0.37.20 to 0.37.25 in /rust (#479)
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.37.20 to 0.37.25.
- [Release notes](https://github.com/bytecodealliance/rustix/releases)
- [Commits](https://github.com/bytecodealliance/rustix/compare/v0.37.20...v0.37.25)

---
updated-dependencies:
- dependency-name: rustix
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-19 21:11:56 -07:00
Jerry Gao ed9466ca13
read file in one time (#460)
* read whole label file to memory, use string find instead stringstream

* format doc
2023-10-18 20:37:25 -07:00
Huisheng Liu 720c45c314
rename 'content' variable to avoid duplicates (#475) 2023-10-16 09:04:52 -07:00
Huisheng Liu c47b3ac1ed
read from MemoryMappedFile when EXEC_ENV_OLS is defined (#471)
* read from MemoryMappedFile when EXEC_ENV_OLS is defined

* fix is_open/close which stringstream does not have

* fix formating to comply with clang

* fix labels.yml: create tmp directory before search_diskk_index is run

* fix to reset stream after reads
2023-10-13 10:21:14 -07:00
Huisheng Liu a5334dd89e
add check for .enc extension to support encryption (#467)
* add check for .enc extension to support encryption

* check rotation_matrix file in file blobs
2023-10-04 11:35:49 -07:00
Shawn Zhong dee332df38
Fix typo in SSD_index.md (#466) 2023-10-04 11:25:49 -07:00
David Kaczynski ced3b4ff4e
Build streaming index of labeled data (#376)
* Add bool param for building a graph of labeled data

* Add arguments for building labeled index

* Pass arguments for labeled index

* Light renaming

* Handle labels in insert_point

* Fix missing semicolon

* Add initial label handling logic

* Use unlabeled algo for uniquely labeled point

* Ignore frozen points when checking labels

* Fix missing newline

* Move label-specific logic to threadsafe zone

* Check for frozen points when assert num points and num labeled points

* Fix file name concatenation for label metadata

* inmem_graph_store initial impl

* Use Lbuild to append to pruned_list during filter build

* Add label counts for deleting from streaming index

* Fix typo

* Fix conditions for testing

* Add medoid search to support deleting label medoids from graph

* resolvig error with bfs_medoid_search()

* trying to create 2 pruned_lists and combine them

* Clear pool between calls to search_for_point_and_prune. Fix integer math

* Update pruned_list algo for link method

* making fz_points to be medoids for labels encountered

* repositioning medoids as well because they are fz points when compacting data

* removing unrequired method

* rebasing from main

* adding tests in yml workflow for dynamic index with labels

* quick fix

* removing combining of unfiltered + filtered list for now

* trying to resolve disk search poor performance

* incleasing L size while searching disk index

* minor roolback

* updating dynamic-label to not use tag file while computing GT

* altering some test search L values

* adding unfiltered search for filtered batch build index

* adding compute gt for zipf dist labels in labsls wowrkflow

* searching filtered streaming index with popular label for now

* reposition fz points as medoids for filtered dynamic build

* minor renaming vars

* seoparate functio for insert opoint with labels and without labels

* clang error fix

* barebones of in mem graph store

* refactoring index to use index factory

* clang format fix

* window build fix

* making enum to enum class (c++ 11 style) for scope resolution with same enum values

* cleaning up API for GraphSore

* resolving comments

* clang error fix

* adding some comments

* moving _nd back to index class

* removing funcrion reposition medoidds its not required, incorporated into reposition_points

* altering -L (32->5) and -R (16->32) whhile building filterted disk index to work well with modified connections in algo

* updating docs -> dynamic_index.md to have info on how to build and search filtered dynamic index

* updating docs

* updateing _pts_to_labels when repositioning fz_points

* error fix

* clang fix

* making sure _pts_to_labels are not empty

* fixing dynamic-label build error

* code improvements

* adding logic for test_ins_del_consolidate to support filtered index

* resolving PR comments

* error fix

* error fix for dynamic

* now test insert delete consolidate support building filters

* lowering recal in case of test insert delete consolidte

* resolving PR comments

* removing _num_frozen_point from graph store

* minor fix

* moving _start back to main + minor update in graph store api to support that

* adding a lock before detect_common_filter + minor naming improvement

* adding requested changes from Gopal

* removing reservations

* resolving namespace resolution for defaults after build failure

* minor update

* minor update

* speeding up location update logic while repositioning

* updated with reserving mem for graph neighbours upfront

* build error fix

* minor update in assert

* initial commit

* updating python bindings to use new ctor

* python binding error fix

* error fix

* reverting some changes -> experiment

* removing redundnt code from native index

* python build error fix

* tyring to resolve python build error

* attempt at python build fix

* adding IndexSearchParams

* setting search threads to non zero

* minor check removed

* eperiment 3-> making distance fully owned by data_store

* exp 3 clang fix

* exp 4

* making distance as unique_ptr

* trying to fix build

* finally fixing problem

* some minor fix

* adding dll export to index_factory static function

* adding dll export for static fn in index_factory

* code cleanup

* resolving errors after merge

* resolving build errors

* fixing build error for stitched index

* resolving build errors

* removing max_observed_degree set()

* removing comments + typo fix

* replacing add_neighbour with set_neighbours where we can

* error fix

* minor fix

* fixing error introduced while rebasing

* fixing error for dynamic filtered index

* resolving dynamic build deadlick error

* resolving error with test_insert_del_consolidate for dynamic filter build

* minor code cleanup

* refactoring fz_pts and filter_index to be property of IndexConfig and hence Index

* removing write_params from build()

* removing write_params from buidl and taking it upfront in Index Ctor

* minor fix

* renaming build_params to filter params

* fixing errors on auto merge

* auto decide universal_label experiment

* resolving bug with universal lable

* resolving dynamic labels error, if there are unused fz points

* exposing set_universal_label() through abstract index

* minor update: sanity check

* minor update to search

* including tag file while computing GT

* generating compacted label file and using it in generate GT

* minor fix

* resolving New PR comments (minor typo fixes)

* renaming _pts_to_labels to _tag_to_labels + adding a warning for consolidate deletes and quality of index

* minor name chnage + code cleanup

* clang format fix

* adding locks for filter data_structures

* avoiding deadock

* universal label defination update

* reverting locks on _location_to_labels as its causing problems with large dataset

* adding locks for _label_to_medoid_id

* Update dynamic_index.md

* Update dynamic-labels.yml

* renaming some variables

---------

Co-authored-by: David Kaczynski <dkaczynski@microsoft.com>
Co-authored-by: yashpatel007 <patelyash1311@gmail.com>
Co-authored-by: Yash Patel <47032340+yashpatel007@users.noreply.github.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>
2023-09-22 09:54:12 -07:00
Jon McLean b8b6caf8c2
Release documentation from the release tag instead of main (#448) 2023-08-31 09:01:16 -07:00
Dax Pryce 4c31367b6c
Preparing for 0.6.1 release (#447) 2023-08-30 15:23:57 -07:00
Dax Pryce a112411efe
Fixes #432, bug in using openmp with gcc and omp_get_num_threads() (#445)
* Fixes #432, bug in using openmp with gcc and omp_get_num_threads() only reporting the number of threads collaborating on the current code region not available overall. I made this error and transitioned us from omp_get_num_procs() about 5 or 6 months ago and only with bug #432 did I really get to see how problematic my naive expectations were.

* Removed cosine distance metric from disk index until we can properly fix it in pqflashindex. Documented what distance metrics can be used with what vector dtypes in tables in the documentation.
2023-08-30 13:51:22 -07:00
rakri fa6c27970a
working draft PR for cleaning up disk based filter search (#414)
* made changes to clean up filter number conversion, and fixed bug with universal filter search

* minor typecast fix

---------

Co-authored-by: rakri <rakri@microsoft.com>
2023-08-30 15:02:34 +05:30
Dax Pryce 353e538f45
Type hints and returns actually align this time. (#444) 2023-08-29 15:49:30 -07:00
Yash Patel 8afb38a1e1
Remove IndexWriteParams from build method. (#441)
* removing write_params from buidl and taking it upfront in Index Ctor

* renaming build_params to filter params
2023-08-28 10:22:29 -07:00
Jon McLean 98b119a248
Added clarity to the universal label (#442) 2023-08-28 22:16:02 +05:30
Harsha Vardhan Simhadri b05c2dcef0
add num_Threads to indexwriteparams in sharded build (#438) 2023-08-24 09:59:15 -07:00
Yash Patel 9622d8f6d5
hot fix definate mem_leaks (#440) 2023-08-23 13:23:42 -07:00
Harsha Vardhan Simhadri fee17e6a34
Reduce CI tests for multi-sector disk layout from 10K to 5K points so… (#439)
* Reduce CI tests for multi-sector disk layout from 10K to 5K points so they run faster

* turn off 1024D
2023-08-22 15:02:39 -07:00
Harsha Vardhan Simhadri 9d5fde183b
Undo mistake, let frontier read in PQ flash index be asynchronous (#434)
* Undo mistake, let frontier read in PQ flash index be asynchronous

* address changes requested
2023-08-22 18:53:46 +05:30
Yash Patel 4162c21189
In Memory Graph Store (#395)
* inmem_graph_store initial impl

* barebones of in mem graph store

* refactoring index to use index factory

* clang format fix

* making enum to enum class (c++ 11 style) for scope resolution with same enum values

* cleaning up API for GraphSore

* moving _nd back to index class

* resolving PR comments

* error fix

* error fix for dynamic

* resolving PR comments

* removing _num_frozen_point from graph store

* minor fix

* moving _start back to main + minor update in graph store api to support that

* adding requested changes from Gopal

* removing reservations

* resolving namespace resolution for defaults after build failure

* minor update

* minor update

* speeding up location update logic while repositioning

* updated with reserving mem for graph neighbours upfront

* build error fix

* minor update in assert

* initial commit

* updating python bindings to use new ctor

* python binding error fix

* error fix

* reverting some changes -> experiment

* removing redundnt code from native index

* python build error fix

* tyring to resolve python build error

* attempt at python build fix

* adding IndexSearchParams

* setting search threads to non zero

* minor check removed

* eperiment 3-> making distance fully owned by data_store

* exp 3 clang fix

* exp 4

* making distance as unique_ptr

* trying to fix build

* finally fixing problem

* some minor fix

* adding dll export to index_factory static function

* adding dll export for static fn in index_factory

* code cleanup

* resolving errors after merge

* resolving build errors

* fixing build error for stitched index

* resolving build errors

* removing max_observed_degree set()

* removing comments + typo fix

* replacing add_neighbour with set_neighbours where we can

* error fix
2023-08-17 14:15:53 -07:00
Philip Adams df7c5303d6
fix OLS build (#428)
* fix OLS build

* Add a build to CI with feature flags enabled
2023-08-17 13:45:14 -07:00
Philip Adams 39b3330496
Add convenience functions for parsing the PQ index (#349)
* move read_nodes to public, add get_pq_vector and get_num_points

* clang-format

* Match new private var naming convention

* more private (_) fixes

* VID->vid

* VID->vid cpp
2023-08-15 15:36:57 -07:00
Yash Patel 6d4e2bfa72
Consolidate Index Constructors (#418)
* initial commit

* updating python bindings to use new ctor

* python binding error fix

* error fix

* reverting some changes -> experiment

* removing redundnt code from native index

* python build error fix

* tyring to resolve python build error

* attempt at python build fix

* adding IndexSearchParams

* setting search threads to non zero

* minor check removed

* eperiment 3-> making distance fully owned by data_store

* exp 3 clang fix

* exp 4

* making distance as unique_ptr

* trying to fix build

* finally fixing problem

* some minor fix

* adding dll export to index_factory static function

* adding dll export for static fn in index_factory

* code cleanup

* resolving gopal's comments

* resolving build failures
2023-08-15 12:58:31 -07:00
Harsha Vardhan Simhadri 977dd3cd20
allow multi-sector layout for large vectors (#417)
* make sector node an inline function

* convert offset_node macro to inline method

* rename member vars to start with underscore in pq_flash_index.h

* added support in create_disk_index

* add read sector util

* load_cache_list now uses read_blocks util

* allow nullptr for read_nodes

* BFS cache generation uses util

* add num_sectors info to cache_beam_Search

* add CI test for 1020,1024,1536D float and 4096D int8 rand vector on disk
2023-08-14 16:30:42 -07:00
Dax Pryce c729e5c6b7
Add Performance Tests (#421)
* Have a working dockerfile to run perf tests and report the times they take. We can also capture stdout/stderr with it for further information, especially for tools that report internal latencies.

* Slight changes to the perf test script, a perf.yml for the github action
2023-08-11 11:54:05 -07:00
Harsha Vardhan Simhadri b572571888
moved ssd index defaults to defaults.h (#415)
* moved ssd index constants to defaults.h
2023-08-08 15:00:51 -07:00
Harsha Vardhan Simhadri 637ed515aa
Update README.md (#416) 2023-08-08 13:08:33 -07:00
Jon McLean 3f58b99777
Added PDoc workflow to publish github pages documentation (#412)
* Added PDoc workflow

* Added documentation to the push-test workflow

* Added diskannpy to the env for pdoc to use

* Initial commit of doc publish workflow

* Tried heredoc to get python version

* Tried another way of getting the version

* Tried another way of getting the version

* Moved to docs/python path

* Removing the test harness

* Add dependencies per wheel

* Moved dependency tree to the 'push' file so it runs on push

* Added label name to the dependency file

* Trying maxtrix.os to get the os and version

* Moved doc generation from push-test to python-release.  Will add 'dev' doc generation to push-test

* Publish latest/version docs only on release.  Publish docs for every dev build on main.

* Install the local-file version of the library

* Disable branch check so I can test the install

* Use python build to build a wheel for use in documentation

* Tried changing to python instead of python3

* Added checkout depth in order to get boost

* Use the python build action to create wheel for documentation

* Revert "Use the python build action to create wheel for documentation"

This reverts commit d900c1d42c.

* Added linux environment setup

* Made only publish dev when on main and added comments

---------

Co-authored-by: Jonathan McLean <Jonathan.McLean@microsoft.com>
2023-08-04 09:21:57 -07:00