Skip to content
46 changes: 32 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,27 @@
# DiskANN
# DiskANN3: A Composable Vector Indexing Library
Comment thread
harsha-simhadri marked this conversation as resolved.

[![DiskANN Main](https://github.com/microsoft/DiskANN/actions/workflows/push-test.yml/badge.svg?branch=main)](https://github.com/microsoft/DiskANN/actions/workflows/push-test.yml)
[![PyPI version](https://img.shields.io/pypi/v/diskannpy.svg)](https://pypi.org/project/diskannpy/)
[![Downloads shield](https://pepy.tech/badge/diskannpy)](https://pepy.tech/project/diskannpy)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
DiskANN3 is a composable library for bringing scalable, accurate and cost-effective vector indexing to multiple databases. It draws on research from the DiskANN project. See the [research overview](https://github.com/microsoft/DiskANN/wiki/DiskANN-Project-and-Research-Overview-(2018%E2%80%90present)) page for more details and references.

[![DiskANN Paper](https://img.shields.io/badge/Paper-NeurIPS%3A_DiskANN-blue)](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf)
[![DiskANN Paper](https://img.shields.io/badge/Paper-Arxiv%3A_Fresh--DiskANN-blue)](https://arxiv.org/abs/2105.09613)
[![DiskANN Paper](https://img.shields.io/badge/Paper-Filtered--DiskANN-blue)](https://harsha-simhadri.org/pubs/Filtered-DiskANN23.pdf)
To use DiskANN3 in your system, you would implement the `DataProvider` trait for your store to describe how index terms such as vectors, adjacency lists should be store and retrieved. DiskANN3 provides vector update and query API to users and internally uses the implementation of `DataProvider` trait to serve these requests.
Comment thread
harsha-simhadri marked this conversation as resolved.

This repo offers the following Provider implementations as illustrative examples:
Comment thread
harsha-simhadri marked this conversation as resolved.
Comment thread
harsha-simhadri marked this conversation as resolved.
- In-memory providers, for maximum performance. These are volatile and not intended for use in databases. DiskANN3 + in-memory providers [outperforms](https://github.com/microsoft/DiskANN/wiki/Perf:-In%E2%80%90memory-providers) HNSWlib on throughput.
- Disk provider, for larger than memory support. This is intended to match the performormance of the first version of DiskANN reported in [NeurIPS'19 Paper](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf).
- [Garnet](https://github.com/Microsoft/Garnet)-based provider for high-throughput scale up vector search, and as an example of mapping to a k-v store. This [outperforms](https://github.com/microsoft/DiskANN/wiki/Perf:-Garnet-Providers-vs-other-Vector-DBs-(Zilliz,-Pinecone,-etc.)) all vector DBs on throughput, latency and recall.
- Bf-tree provider as an illustration of how to connect to a B-tree in your database.

The provider for [Cosmos DB NoSQL Vector Search](https://learn.microsoft.com/en-us/azure/cosmos-db/vector-search) is not included here but documented in the [VLDB'25 paper](https://www.vldb.org/pvldb/vol18/p5166-upreti.pdf).

> [!IMPORTANT]
> We are currently in the process of updating this repository with a new version of the code written in Rust.
The library supports the following algorithmic features
Comment thread
harsha-simhadri marked this conversation as resolved.
- Real-time updates (using logic from [IP-DiskANN](https://arxiv.org/abs/2502.13826) and [Fresh-DiskANN](https://arxiv.org/abs/2105.09613)) that support stable recall under long update streams -- no merges, rebuilds, patches needed.
- A diverse set of distance functions and quantizers (PQ, MinMax, Scalar, Spherical) implemented for x86 and aarch64.
- Choice of memory tiers to allow operation at different price-performance points.
- Vector search interfaces that allow pagination, range filters (e.g., dist<0.5), [diversity aware](https://arxiv.org/abs/2602.08742) top-k search.
- Hooks to allow attribute filters (predicate) processing along with vector search.

DiskANN is a suite of scalable, accurate and cost-effective approximate nearest neighbor search algorithms for large-scale vector search that support real-time changes and simple filters.
This code is based on ideas from Microsoft's [DiskANN](https://aka.ms/AboutDiskANN).
The main branch now contains a rearchitected project written in Rust.
## Getting Started

- Start with [diskann-benchmarks](/diskann-benchmark/README.md) to benchmark this library and its concrete implementations. This also allows you to build, store and load indices.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
Expand All @@ -24,7 +31,18 @@ See [guidelines](CONTRIBUTING.md) for contributing to this project.

## Legacy C++ Code
Comment thread
harsha-simhadri marked this conversation as resolved.

Older C++ code is retained on the `cpp_main` branch, but is not actively developed or maintained.
[![PyPI version](https://img.shields.io/pypi/v/diskannpy.svg)](https://pypi.org/project/diskannpy/)
[![Downloads shield](https://pepy.tech/badge/diskannpy)](https://pepy.tech/project/diskannpy)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)


Older C++ code is retained on the `cpp_main` branch, and implements the following papers, but is not actively developed or maintained.
This was the second rewrite of DiskANN algorithms.

[![DiskANN Paper](https://img.shields.io/badge/Paper-NeurIPS%3A_DiskANN-blue)](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf)
[![DiskANN Paper](https://img.shields.io/badge/Paper-Arxiv%3A_Fresh--DiskANN-blue)](https://arxiv.org/abs/2105.09613)
[![DiskANN Paper](https://img.shields.io/badge/Paper-Filtered--DiskANN-blue)](https://harsha-simhadri.org/pubs/Filtered-DiskANN23.pdf)

The legacy C++ code was forked off from [code for NSG](https://github.com/ZJULearning/nsg) algorithm.

If you use the C++ version in your software please cite the following:
Expand Down
Loading