Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search
Перейти к файлу
Andrija Antonijevic 5cf0360d7e
Fix calculation of current_point_offset in test_insert_consolidate_deletes (#501)
The program builds the streaming index after two optional steps: 1) skipping S points from the input file and 2) batch building of initial index using B points from the input file.

After these two steps, the offset to the input file should be S + B, but the current code first sets it to S in line 163 then overwrites it to B in line 249, instead of adding B to the offset. The tool which `test_insert_deletes_consolidate` was based on was using `+=` in the modified line.
2024-01-26 09:45:18 +05:30
.github read from MemoryMappedFile when EXEC_ENV_OLS is defined (#471) 2023-10-13 10:21:14 -07:00
AnyBuildLogs read file in one time (#460) 2023-10-18 20:37:25 -07:00
apps Fix calculation of current_point_offset in test_insert_consolidate_deletes (#501) 2024-01-26 09:45:18 +05:30
gperftools@fe85bbdf4c Update for vamana (bulk and fresh) from kann-experiments. 2022-05-12 09:20:55 -07:00
include Adding a new PQ Distance Metric and PQ Data Store (#384) 2023-12-05 13:18:27 +05:30
python Fixing index_prefix_path bug in python for StaticMemoryIndex (#491) 2023-11-09 15:05:59 -08:00
rust Bump zerocopy from 0.6.1 to 0.6.6 in /rust (#499) 2023-12-18 00:49:50 -06:00
scripts Add Performance Tests (#421) 2023-08-11 11:54:05 -07:00
src Adding a new PQ Distance Metric and PQ Data Store (#384) 2023-12-05 13:18:27 +05:30
tests Build streaming index of labeled data (#376) 2023-09-22 09:54:12 -07:00
windows Add unit test project based on boost_unit_test_framework (#365) 2023-06-01 16:45:11 -07:00
workflows Fix typo in SSD_index.md (#466) 2023-10-04 11:25:49 -07:00
.clang-format Clang-format now errors on push and PR if formatting is incorrect (#236) 2023-03-17 13:39:48 -07:00
.gitattributes diskann code 2020-09-05 00:25:44 -07:00
.gitignore Fixed param documentation (#393) 2023-07-17 10:02:10 -07:00
.gitmodules Update for vamana (bulk and fresh) from kann-experiments. 2022-05-12 09:20:55 -07:00
CMakeLists.txt Use TCMalloc to fix system memory leak (#494) 2023-12-01 14:48:45 +05:30
CMakeSettings.json Rebasing main's latest commits onto ravi/filter_support_rebased (#225) 2023-03-15 13:49:48 -07:00
CODE_OF_CONDUCT.md diskann code 2020-09-05 00:25:44 -07:00
CONTRIBUTING.md Rename CONTRIBUTING.MD to CONTRIBUTING.md 2022-02-25 13:18:15 -08:00
Dockerfile updated dockerfile (#299) 2023-04-05 14:12:05 -07:00
DockerfileDev Add unit test project based on boost_unit_test_framework (#365) 2023-06-01 16:45:11 -07:00
LICENSE diskann code 2020-09-05 00:25:44 -07:00
MANIFEST.in New python interface, build setup, apps and unit tests (#308) 2023-04-27 13:41:04 -07:00
NOTICE.txt diskann code 2020-09-05 00:25:44 -07:00
README.md Preparing for 0.6.1 release (#447) 2023-08-30 15:23:57 -07:00
SECURITY.md diskann code 2020-09-05 00:25:44 -07:00
clang-format.cmake Improve help formatting in CLI tools (#390) 2023-07-19 10:13:20 -07:00
pyproject.toml Fixing index_prefix_path bug in python for StaticMemoryIndex (#491) 2023-11-09 15:05:59 -08:00
setup.py Enable Windows python bindings (#343) 2023-05-09 13:14:46 -07:00
unit_tester.sh Harshasi/reorg scratch simplify build rebase with main (#152) 2022-11-22 09:20:10 -08:00

README.md

DiskANN

DiskANN Main PyPI version Downloads shield License: MIT

DiskANN Paper DiskANN Paper DiskANN Paper

DiskANN is a suite of scalable, accurate and cost-effective approximate nearest neighbor search algorithms for large-scale vector search that support real-time changes and simple filters. This code is based on ideas from the DiskANN, Fresh-DiskANN and the Filtered-DiskANN papers with further improvements. This code forked off from code for NSG algorithm.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

See guidelines for contributing to this project.

Linux build:

Install the following packages through apt-get

sudo apt install make cmake g++ libaio-dev libgoogle-perftools-dev clang-format libboost-all-dev

Install Intel MKL

Ubuntu 20.04 or newer

sudo apt install libmkl-full-dev

Earlier versions of Ubuntu

Install Intel MKL either by downloading the oneAPI MKL installer or using apt (we tested with build 2019.4-070 and 2022.1.2.146).

# OneAPI MKL Installer
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18487/l_BaseKit_p_2022.1.2.146.sh
sudo sh l_BaseKit_p_2022.1.2.146.sh -a --components intel.oneapi.lin.mkl.devel --action install --eula accept -s

Build

mkdir build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make -j 

Windows build:

The Windows version has been tested with Enterprise editions of Visual Studio 2022, 2019 and 2017. It should work with the Community and Professional editions as well without any changes.

Prerequisites:

  • CMake 3.15+ (available in VisualStudio 2019+ or from https://cmake.org)
  • NuGet.exe (install from https://www.nuget.org/downloads)
    • The build script will use NuGet to get MKL, OpenMP and Boost packages.
  • DiskANN git repository checked out together with submodules. To check out submodules after git clone:
git submodule init
git submodule update
  • Environment variables:
    • [optional] If you would like to override the Boost library listed in windows/packages.config.in, set BOOST_ROOT to your Boost folder.

Build steps:

  • Open the "x64 Native Tools Command Prompt for VS 2019" (or corresponding version) and change to DiskANN folder
  • Create a "build" directory inside it
  • Change to the "build" directory and run
cmake ..

OR for Visual Studio 2017 and earlier:

<full-path-to-installed-cmake>\cmake ..

This will create a diskann.sln solution. Now you can:

  • Open it from VisualStudio and build either Release or Debug configuration.
  • <full-path-to-installed-cmake>\cmake --build build
  • Use MSBuild:
msbuild.exe diskann.sln /m /nologo /t:Build /p:Configuration="Release" /property:Platform="x64"
  • This will also build gperftools submodule for libtcmalloc_minimal dependency.
  • Generated binaries are stored in the x64/Release or x64/Debug directories.

Usage:

Please see the following pages on using the compiled code:

Please cite this software in your work as:

@misc{diskann-github,
   author = {Simhadri, Harsha Vardhan and Krishnaswamy, Ravishankar and Srinivasa, Gopal and Subramanya, Suhas Jayaram and Antonijevic, Andrija and Pryce, Dax and Kaczynski, David and Williams, Shane and Gollapudi, Siddarth and Sivashankar, Varun and Karia, Neel and Singh, Aditi and Jaiswal, Shikhar and Mahapatro, Neelam and Adams, Philip and Tower, Bryan and Patel, Yash}},
   title = {{DiskANN: Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search}},
   url = {https://github.com/Microsoft/DiskANN},
   version = {0.6.1},
   year = {2023}
}