RFC: integrate PiPNN as an alternative graph-index build algorithm#1049
RFC: integrate PiPNN as an alternative graph-index build algorithm#1049SeliMeli wants to merge 4 commits into
Conversation
|
@SeliMeli please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
Per rfcs/README.md step 4: rename from 00000-short-title.md to NNNNN-short-title.md using the zero-padded PR number (microsoft#1049). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an RFC proposing PiPNN as an opt-in, feature-gated alternative to Vamana for DiskANN disk-index graph construction, keeping disk format and search API unchanged.
Changes:
- Introduces RFC 01049 detailing PiPNN’s algorithm, integration plan, and two-stage rollout
- Specifies a
BuildAlgorithmselector design and feature-gating strategy (pipnn) - Documents benchmark results and Stage-1 milestones gating potential Stage-2 deprecation of Vamana full rebuilds
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1049 +/- ##
=======================================
Coverage 90.60% 90.60%
=======================================
Files 461 461
Lines 85494 85494
=======================================
+ Hits 77462 77465 +3
+ Misses 8032 8029 -3
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Adds an RFC proposing PiPNN (arXiv:2602.21247) as a second graph-index build algorithm for DiskANN's disk index. Integration is two-stage: Stage 1 lands PiPNN behind a build-algorithm selector with Vamana as default; Stage 2 (conditional on Stage 1 milestones) retires Vamana's full-rebuild path while keeping it for incremental inserts via the hybrid update model. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per rfcs/README.md step 4: rename from 00000-short-title.md to NNNNN-short-title.md using the zero-padded PR number (microsoft#1049). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tones - Add new M1 for in-memory build/search parity with Vamana (PiPNN today only feeds into DiskIndexWriter; a path that populates a DiskANNIndex directly for in-mem-only consumers is missing). - Renumber M1-M7 → M2-M8. - Convert each milestone's plain-text paragraph into bullet lists (Scope / Validation / etc.) for readability per RFC reviewer feedback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Explicitly document feature-gated deserialization behavior: configs with "algorithm": "PiPNN" fail at parse time in non-pipnn binaries with a serde unknown-variant error. Not a backward-compatibility regression; configs without build_algorithm parse identically across feature combinations. - Add explanation for disk-edges path being not-slower than one-shot despite extra I/O (smaller working set, sequential append spills overlap with compute). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
943fa74 to
4fe210f
Compare
| - **API:** add `diskann_pipnn::build_into_inmem_index(...)` returning an in-memory index that is read by the existing `DiskANNIndex::search` path unchanged. | ||
| - **Validation:** in-mem search recall on Enron 1M with PiPNN-built graph matches the disk-build + load round-trip recall within noise. | ||
|
|
||
| ### M2 — Feature parity: checkpoint / resume |
There was a problem hiding this comment.
checkpoint may need to be put in stage2. currently the streaming checkpoing like vamana dose is not fit PIPNN's batch build
|
|
||
| ### Background | ||
|
|
||
| DiskANN currently builds the disk index with a single algorithm — **Vamana** (`diskann-disk/src/build/builder/`). Vamana incrementally inserts each point into a graph, running a greedy search + `RobustPrune` for each insertion, producing the on-disk format documented in `diskann-disk/src/storage/`. |
There was a problem hiding this comment.
Probably worth noting that today clients update their indices in three main ways:
Incremental — continuously insert and delete vectors in an existing graph.
Full rebuild — rebuild the entire graph from scratch, producing an immutable graph.
Partitioned full rebuild — split points into N clusters, build N separate graphs, then stitch them together with a lightweight merge step to reduce peak memory usage during build.
|
|
||
| DiskANN currently builds the disk index with a single algorithm — **Vamana** (`diskann-disk/src/build/builder/`). Vamana incrementally inserts each point into a graph, running a greedy search + `RobustPrune` for each insertion, producing the on-disk format documented in `diskann-disk/src/storage/`. | ||
|
|
||
| **PiPNN** (Pick-in-Partitions Nearest Neighbors, arXiv:2602.21247) is a partition-based **batch** graph builder, in contrast to Vamana's **incremental** insert + prune. The construction has four phases: |
There was a problem hiding this comment.
Minor: Please provide full link for convinience
| **PiPNN** (Pick-in-Partitions Nearest Neighbors, arXiv:2602.21247) is a partition-based **batch** graph builder, in contrast to Vamana's **incremental** insert + prune. The construction has four phases: | ||
|
|
||
| 1. **Partition** — Randomized Ball Carving (RBC) recursively splits the dataset into small *overlapping* leaf clusters. Each point lands in `fanout` of its nearest cluster leaders at every recursion level, so every point appears in multiple leaves. Recursion stops when a cluster fits a configured leaf-size cap (`c_max`, typically 256–1024 points). | ||
| 2. **Local k-NN per leaf** — For each leaf, compute the full pairwise distance matrix in one batched GEMM call, then extract each point's `leaf_k` nearest neighbors inside the leaf. GEMM batching is the source of most of PiPNN's wall-clock advantage over per-point greedy search. |
There was a problem hiding this comment.
@arkrishn94, is Local k-NN per leaf a good candidate for being implemented using the flat scan proposal we are working on?
A work item to track this discussion: #1036
There was a problem hiding this comment.
the knn build in pipnn is a pair-wise gemm backed (N * N), the scenario is quite different to the flat-scan (1 * N). But we can measure. The flat-scan build knn is more like all-to-all prune option in the paper which is extreamly slow
| | BigANN 10M (10M × 128, fp16, squared_l2) | 358s | | ||
| | Enron 10M (10M × 384, fp16, cosine_normalized) | 844s | | ||
|
|
||
| Frequent rebuilds (driven by data churn or parameter sweeps) and full rebuilds at 10M-scale and above are the bottleneck. PiPNN's offline benchmarks at matching recall budgets complete the same builds **up to 6.3× faster** while writing the same disk format (full numbers in the Benchmark Results section). This RFC proposes landing PiPNN so teams can opt into faster builds and so we can collect production-relevant signal on whether PiPNN can eventually replace Vamana's full-rebuild path. |
There was a problem hiding this comment.
builds up to 6.3× faster
This problem statement should clearly call out the trade-offs, constraints, and what we’re optimizing for.
We should also define concrete hypotheses and validate them.
For example: given a fixed set of workers (with N CPU cores, M RAM, and S SSD per node), PiPNN can improve index build throughput (vectors per minute per worker) by driving close to full CPU and RAM utilization. In contrast, Vamana cannot effectively leverage unused RAM to accelerate the build.
In other words, we should lock available CPU, RAM, SSD and compare build duration with the same available resources.
Let's assume that Vamana needs 6GB to build 10M-vector graph, but PipNN needs 10GB.
Which algorithm is faster if we have:
- 3GB RAM available
- 6GB RAM available
- 10GB RAM available
- 20GB RAM available (can PipNN autotune itself to use all available RAM budget to reduce index build time?)
|
|
||
| - **Bulk / full rebuild → PiPNN.** When data churn is large enough to justify a full rebuild, PiPNN is used because it is several times faster than Vamana at this job. | ||
| - **Incremental insert → Vamana.** Between full rebuilds, individual inserts use Vamana's existing greedy-search + RobustPrune insert path. PiPNN's batch design has no natural single-point-insert API and we do not plan to build one. | ||
| - **Quality decay → trigger PiPNN rebuild.** When recall on the live graph degrades past a configured threshold (driven by accumulated incremental inserts), the system schedules a PiPNN full rebuild from the current dataset snapshot. |
There was a problem hiding this comment.
I don’t think we expect the graph to degrade enough that a full rebuild from scratch is necessary. DiskANN claims that the graph can remain healthy over time with incremental updates.
There was a problem hiding this comment.
yes, but other case rebuilt is still an option like: embedding rotations, schema changes, parameter retuning, batch insert. those cases could be benifit from pipnn as well. Will fix the wording
| ### Goals | ||
|
|
||
| 1. **Algorithm-level pluggability**: introduce a build-algorithm selector to the build pipeline that routes between Vamana (existing) and PiPNN (new). Existing build sites continue to default to Vamana with no behavior change. | ||
| 2. **Disk format compatibility**: the PiPNN-built index is byte-compatible with Vamana-built indexes on disk — search, PQ, and storage layouts are unchanged. This is the foundation for the hybrid update model. |
There was a problem hiding this comment.
There are no plans to mutate disk index.
There was a problem hiding this comment.
will fix the wording: only inmem vanana could insert to loaded graph, disk index rebuilt
Summary
This RFC proposes adding PiPNN (Pick-in-Partitions Nearest Neighbors, arXiv:2602.21247) as a second graph-construction algorithm for DiskANN's disk index, alongside the existing Vamana builder.
PiPNN produces a graph byte-compatible with Vamana's disk format and search API, at up to 6.3× lower build time on our measured workloads (Enron 10M, BigANN 10M). Vamana remains the default and the only algorithm supported for incremental inserts; PiPNN is the proposed faster path for full rebuilds.
Two-stage integration
BuildAlgorithmselector withVamanaas the default. PiPNN is opt-in via apipnnCargo feature. Existing build sites see no behavior change. Stage 1 defines explicit milestones (M0–M7) gating Stage 2 readiness.Highlights
build_ram_limit_gbknob, bringing PiPNN's peak RSS to or below Vamana's at a configurable build-time cost.Reviewers
Please read the full RFC for trade-offs, benchmark tables, and milestone definitions.
🤖 Generated with Claude Code