DiskANN/tests
David Kaczynski ced3b4ff4e
Build streaming index of labeled data (#376)
* Add bool param for building a graph of labeled data

* Add arguments for building labeled index

* Pass arguments for labeled index

* Light renaming

* Handle labels in insert_point

* Fix missing semicolon

* Add initial label handling logic

* Use unlabeled algo for uniquely labeled point

* Ignore frozen points when checking labels

* Fix missing newline

* Move label-specific logic to threadsafe zone

* Check for frozen points when assert num points and num labeled points

* Fix file name concatenation for label metadata

* inmem_graph_store initial impl

* Use Lbuild to append to pruned_list during filter build

* Add label counts for deleting from streaming index

* Fix typo

* Fix conditions for testing

* Add medoid search to support deleting label medoids from graph

* resolvig error with bfs_medoid_search()

* trying to create 2 pruned_lists and combine them

* Clear pool between calls to search_for_point_and_prune. Fix integer math

* Update pruned_list algo for link method

* making fz_points to be medoids for labels encountered

* repositioning medoids as well because they are fz points when compacting data

* removing unrequired method

* rebasing from main

* adding tests in yml workflow for dynamic index with labels

* quick fix

* removing combining of unfiltered + filtered list for now

* trying to resolve disk search poor performance

* incleasing L size while searching disk index

* minor roolback

* updating dynamic-label to not use tag file while computing GT

* altering some test search L values

* adding unfiltered search for filtered batch build index

* adding compute gt for zipf dist labels in labsls wowrkflow

* searching filtered streaming index with popular label for now

* reposition fz points as medoids for filtered dynamic build

* minor renaming vars

* seoparate functio for insert opoint with labels and without labels

* clang error fix

* barebones of in mem graph store

* refactoring index to use index factory

* clang format fix

* window build fix

* making enum to enum class (c++ 11 style) for scope resolution with same enum values

* cleaning up API for GraphSore

* resolving comments

* clang error fix

* adding some comments

* moving _nd back to index class

* removing funcrion reposition medoidds its not required, incorporated into reposition_points

* altering -L (32->5) and -R (16->32) whhile building filterted disk index to work well with modified connections in algo

* updating docs -> dynamic_index.md to have info on how to build and search filtered dynamic index

* updating docs

* updateing _pts_to_labels when repositioning fz_points

* error fix

* clang fix

* making sure _pts_to_labels are not empty

* fixing dynamic-label build error

* code improvements

* adding logic for test_ins_del_consolidate to support filtered index

* resolving PR comments

* error fix

* error fix for dynamic

* now test insert delete consolidate support building filters

* lowering recal in case of test insert delete consolidte

* resolving PR comments

* removing _num_frozen_point from graph store

* minor fix

* moving _start back to main + minor update in graph store api to support that

* adding a lock before detect_common_filter + minor naming improvement

* adding requested changes from Gopal

* removing reservations

* resolving namespace resolution for defaults after build failure

* minor update

* minor update

* speeding up location update logic while repositioning

* updated with reserving mem for graph neighbours upfront

* build error fix

* minor update in assert

* initial commit

* updating python bindings to use new ctor

* python binding error fix

* error fix

* reverting some changes -> experiment

* removing redundnt code from native index

* python build error fix

* tyring to resolve python build error

* attempt at python build fix

* adding IndexSearchParams

* setting search threads to non zero

* minor check removed

* eperiment 3-> making distance fully owned by data_store

* exp 3 clang fix

* exp 4

* making distance as unique_ptr

* trying to fix build

* finally fixing problem

* some minor fix

* adding dll export to index_factory static function

* adding dll export for static fn in index_factory

* code cleanup

* resolving errors after merge

* resolving build errors

* fixing build error for stitched index

* resolving build errors

* removing max_observed_degree set()

* removing comments + typo fix

* replacing add_neighbour with set_neighbours where we can

* error fix

* minor fix

* fixing error introduced while rebasing

* fixing error for dynamic filtered index

* resolving dynamic build deadlick error

* resolving error with test_insert_del_consolidate for dynamic filter build

* minor code cleanup

* refactoring fz_pts and filter_index to be property of IndexConfig and hence Index

* removing write_params from build()

* removing write_params from buidl and taking it upfront in Index Ctor

* minor fix

* renaming build_params to filter params

* fixing errors on auto merge

* auto decide universal_label experiment

* resolving bug with universal lable

* resolving dynamic labels error, if there are unused fz points

* exposing set_universal_label() through abstract index

* minor update: sanity check

* minor update to search

* including tag file while computing GT

* generating compacted label file and using it in generate GT

* minor fix

* resolving New PR comments (minor typo fixes)

* renaming _pts_to_labels to _tag_to_labels + adding a warning for consolidate deletes and quality of index

* minor name chnage + code cleanup

* clang format fix

* adding locks for filter data_structures

* avoiding deadock

* universal label defination update

* reverting locks on _location_to_labels as its causing problems with large dataset

* adding locks for _label_to_medoid_id

* Update dynamic_index.md

* Update dynamic-labels.yml

* renaming some variables

---------

Co-authored-by: David Kaczynski <dkaczynski@microsoft.com>
Co-authored-by: yashpatel007 <patelyash1311@gmail.com>
Co-authored-by: Yash Patel <47032340+yashpatel007@users.noreply.github.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>
2023-09-22 09:54:12 -07:00
..
CMakeLists.txt Add unit test project based on boost_unit_test_framework (#365) 2023-06-01 16:45:11 -07:00
README.md Add unit test project based on boost_unit_test_framework (#365) 2023-06-01 16:45:11 -07:00
index_write_parameters_builder_tests.cpp Build streaming index of labeled data (#376) 2023-09-22 09:54:12 -07:00
main.cpp Add unit test project based on boost_unit_test_framework (#365) 2023-06-01 16:45:11 -07:00

README.md

Unit Test project

This unit test project is based on the boost unit test framework. Below are the simple steps to add new unit test, you could find more usage from the boost unit test document.

How to add unit test