bf_tree migration away from diskann-providers#1020
Conversation
b94f4df to
44be934
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1020 +/- ##
=========================================
Coverage 89.51% 89.52%
=========================================
Files 461 476 +15
Lines 85920 90193 +4273
=========================================
+ Hits 76912 80746 +3834
- Misses 9008 9447 +439
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
ca413d4 to
1051152
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Migrates the async bf_tree provider away from diskann-providers by extracting it into a dedicated diskann-bf_tree-provider crate, while also modernizing quantization (PQ → spherical) and simplifying deletion semantics (hard deletes).
Changes:
- Extracts bf_tree + caching into
diskann-bf_tree-providerand updates workspace/CI accordingly. - Replaces PQ-based quant vector store with spherical quantization (
Poly<dyn Quantizer>) + updated serialization. - Simplifies delete handling by removing the delete-provider generic and switching to hard deletes.
Reviewed changes
Copilot reviewed 18 out of 20 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-providers/src/model/graph/provider/async_/table_delete_provider.rs | Makes delete-table operations public and removes bf_tree-specific bitmap (de)serialization helpers/tests. |
| diskann-providers/src/model/graph/provider/async_/mod.rs | Adjusts module exports, removes bf_tree/caching modules, re-exports postprocess utilities. |
| diskann-providers/src/model/graph/provider/async_/caching/utils.rs | Updates import path and minor formatting in tests. |
| diskann-providers/src/model/graph/provider/async_/caching/provider.rs | Refactors imports/bounds formatting; adds Future import. |
| diskann-providers/src/model/graph/provider/async_/caching/mod.rs | Removes caching module root from diskann-providers. |
| diskann-providers/src/model/graph/provider/async_/caching/example.rs | Updates test imports to use diskann_providers / local module paths. |
| diskann-providers/src/model/graph/provider/async_/caching/bf_cache.rs | Updates ConfigError import path and test formatting. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/vector_provider.rs | Adds hard-delete helper and adjusts TestCallCount assertions. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs | Switches PQ → spherical quantization + new query/distance computer adapter + test updates. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/provider.rs | Removes delete-provider generic, implements hard delete, switches postprocess to CopyIds, updates quantizer serialization. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/neighbor_provider.rs | Updates TestCallCount import path. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/mod.rs | Switches re-exports to new bf_tree_provider module and removes QuantAccessor re-export. |
| diskann-providers/Cargo.toml | Removes bf_tree feature/deps from diskann-providers. |
| diskann-bf_tree-provider/src/lib.rs | Introduces new crate entry point with provider + caching modules. |
| diskann-bf_tree-provider/src/caching/mod.rs | Adds caching module root (example tests disabled with TODO). |
| diskann-bf_tree-provider/Cargo.toml | Adds new crate manifest + dependencies for bf_tree provider and spherical quantization. |
| Cargo.toml | Adds diskann-bf_tree-provider to workspace members. |
| .github/workflows/ci.yml | Removes bf_tree from explicitly tested feature list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1219bcc to
cbea923
Compare
|
@copilot would you mind giving this a review |
Reviewed the current branch at |
Add bf_tree insert benchmarks (Criterion + IAI)
Add pre-migration bf_tree provider benchmarks mirroring the existing
inmem insert benchmark. Both Criterion (wall-clock) and IAI-Callgrind
(instruction-count) benchmarks use the same sift-256 dataset and PQ
training as the inmem benchmarks for apples-to-apples comparison.
Gated behind #[cfg(feature = "bf_tree")] so existing benchmarks
continue to work without the feature flag.
Run with: cargo bench --bench bench_main --features virtual_storage,bf_tree
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add bf_tree search benchmarks for migration parity
Add search benchmarks (Criterion + IAI-Callgrind) alongside existing
insert benchmarks. Search exercises the read path through
BfTreeProvider (vector lookups, neighbor access, greedy search
traversal) which is critical for detecting regressions post-migration.
The search benchmark builds the index once during setup, then measures
10 KNN queries (k=10, l=20) against the populated index.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract bf_tree provider into diskann-bf_tree-provider crate
Move the bf_tree provider and caching modules from diskann-providers
into the dedicated diskann-bf_tree-provider crate. This makes the
bf-tree dependency explicit at the crate level rather than hidden
behind a feature flag.
Changes:
- Move bf_tree/ and caching/ modules to diskann-bf_tree-provider/src/
- Set up Cargo.toml with proper dependencies (bf-tree, diskann,
diskann-providers, diskann-utils, diskann-vector, etc.)
- Convert all crate:: imports to diskann_providers:: imports
- Bump visibility of postprocess module, AsDeletionCheck/DeletionCheck
traits, and TableDeleteProviderAsync methods to pub
- Remove bf_tree feature flag and optional deps from diskann-providers
- Remove cfg(feature = "bf_tree") gates from table_delete_provider.rs
- Rename provider/provider.rs to provider/bf_tree_provider.rs (clippy)
- Disable caching example.rs tests (cross-crate cfg(test) limitation)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move bf_tree benchmarks to diskann-bf_tree-provider
Migrate Criterion and IAI-Callgrind benchmark targets from
diskann-providers to the new diskann-bf_tree-provider crate.
Update imports to use the new crate's module paths. Remove
bf_tree benchmark remnants (cfg gates, modules) from
diskann-providers benches.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cargo.lock got missed by last commit
Move delete bitmap serialization to diskann-bf_tree-provider
Move to_bytes/from_bytes logic out of TableDeleteProviderAsync (where
it was bf_tree-scoped) into a dedicated delete_bitmap_serde module in
the new crate. The reimplementation works through the public API and
produces byte-identical output (LE u32 words). Revert the methods in
diskann-providers to pub(crate) + #[cfg(test)].
Adds format-level compatibility tests including known fixtures and
padding bit rejection.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move DeletionCheck traits and RemoveDeletedIdsAndCopy to diskann::graph::glue
These are core to the inplace delete algorithm's search post-processing
and belong in the diskann crate alongside SearchPostProcess, CopyIds,
FilterStartPoints, and Pipeline. This removes the need to widen
diskann-providers' postprocess module to pub for external crates.
Also removes the now-dead to_bytes/from_bytes methods and their tests
from TableDeleteProviderAsync in diskann-providers, since that
serialization logic has been moved to diskann-bf_tree-provider.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use workspace_root() for benchmark test data paths
Replace env!("CARGO_MANIFEST_DIR")-relative paths with
diskann_utils::workspace_root() so benchmarks find test_data/
without requiring symlinks in each crate directory.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The bf_tree feature was removed from diskann-providers when the provider was extracted into its own crate (diskann-bf_tree-provider). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rewrite QuantVectorProvider to use Arc<Poly<dyn Quantizer>> - Add local HybridComputer and QuantQueryComputer wrapper - Update CreateQuantProvider impl for Poly<dyn Quantizer> - Update serialization to use Quantizer::serialize/try_deserialize - Rewrite all tests and doc examples for spherical quantizer - Update benchmarks to use SphericalQuantizer::train with ReciprocalMeanNorm - Move rand to regular dependencies (needed by doc tests) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Delete HybridAccessor, HybridComputer, and all Hybrid strategy impls (Search, PostProcess, Prune, Insert, MultiInsert, InplaceDelete) - Delete Rerank post-processor (only used by Hybrid) - Remove QuantAccessor (only constructed by Hybrid search path) - Move QuantQueryComputer from hybrid_computer.rs to quant_vector_provider.rs - Delete hybrid_computer.rs - Remove metric field from QuantVectorProvider (dead after Hybrid removal) - Add SaveWith/LoadWith serialization impls back (accidentally lost during D removal) - Fix test compilation: remove deleted_ids references, fix turbofish generics, insert vectors before status checks, skip hard-deleted IDs in verification loops - Update doc examples to remove NoDeletes/TableBasedDeletes from constructors - Clean up unused imports (distances::pq, Hybrid, DistanceFunction, etc.) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ree impl now uses hard deletion)
Restore postprocess.rs in diskann-providers with AsDeletionCheck, DeletionCheck, and RemoveDeletedIdsAndCopy. These were temporarily moved into diskann/src/graph/glue.rs during bf_tree extraction but belong alongside the inmem providers that use them. - Create diskann-providers/src/model/graph/provider/async_/postprocess.rs - Remove traits and struct from diskann/src/graph/glue.rs - Update all inmem provider imports to use local postprocess module - Update DeletionCheck impls on NoDeletes and TableDeleteProviderAsync Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Restore QuantAccessor with Opaque-based element types for spherical quantization compatibility - Add Quantized strategy impls: SearchStrategy, DefaultPostProcessor, PruneStrategy, InsertStrategy, MultiInsertStrategy, InplaceDeleteStrategy - Use PassThrough working set pattern (matching inmem spherical approach) - Add OwnedOpaque newtype for workingset::View (bf_tree copies vs zero-copy) - Add BuildDistanceComputer impl using UnwrapErr<DistanceComputer> - Remove #[cfg(test)] from get_vector_into and get_vector_sync Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Hybrid searches in quantized space (same as Quantized), then reranks final candidates using full-precision distances. Pruning stays in quantized space. This replaces the old HybridComputer with its 4 distance modes (full×full, quant×quant, cross-type) with a simpler search-quant → rerank-full → prune-quant pipeline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reverts accidental pub visibility on TableDeleteProviderAsync methods (is_deleted, delete, undelete, clear) and removes unused RemoveDeletedIdsAndCopy re-export. Nothing outside diskann-providers uses these, so widening the public API was unnecessary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix FullAccessor::on_elements_unordered to silently skip deleted entries - Fix QuantAccessor::on_elements_unordered to silently skip deleted entries - Add create_quant_index and create_full_precision_index test helpers - Add tests: quantized insert+search, multi_insert, delete+search - Add tests: hybrid insert+search, hybrid delete+search - Add tests: full precision insert+search, full precision delete+search - Add doc comments for OwnedOpaque, strategy descriptions, and BfTreeProvider Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rerank::post_process used .expect() on get_vector_sync, which could panic if a concurrent delete removed a vector between the quantized search phase and the reranking phase. Replace with .filter_map() to silently skip unreadable entries, consistent with the pattern used in QuantAccessor::on_elements_unordered and FullAccessor. Caught during local code review. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0ad39b8 to
83e6231
Compare
…on delete FullAccessor::get_element and FullPrecision::get_delete_element used .expect() which was safe under the old soft-delete model where entries remained physically in the bf_tree. With hard deletes, data is removed from the tree immediately but graph edges are cleaned up lazily via drop_deleted_neighbors. Stale edges can cause get_vector_into to fail for deleted IDs, making .expect() a panic risk in production. Change GetError and DeleteElementError from Panics to ANNError and propagate errors to callers, consistent with QuantAccessor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Jordan, we can do quite a bit of cleanup on the crate structure and item names which I think would really start to tighten up the crate.
One question I had for you is what the expected end state is: are we targeting a full clean up or an incremental, non-published state while we continue working? I ask because the bf-tree provider inherited a bit of cruft from earlier iterations of the code, and I'm wondering how aggressiver we want to be in this PR in particular, considering that we still need integration tests and benchmarks.
| "diskann-benchmark", | ||
| "diskann-tools", | ||
| "vectorset", | ||
| "vectorset", "diskann-bf_tree", |
There was a problem hiding this comment.
diskann-bf_tree is quite a name ... maybe diskann-bftree? At least something that doesn't mix hyphens and underscores :laugh:
Also - the formatting here is a little off.
There was a problem hiding this comment.
oh strange, i missed that. Also, yeah this is like the third iteration on a name. I'll just make it diskann-bftree.
| pub mod fast_memory_quant_vector_provider; | ||
| pub use fast_memory_quant_vector_provider::FastMemoryQuantVectorProviderAsync; | ||
|
|
||
| pub(crate) mod postprocess; |
There was a problem hiding this comment.
I'll move that back.
| diskann-vector.workspace = true | ||
| half = { workspace = true, features = ["bytemuck", "num-traits"] } | ||
| futures-util.workspace = true | ||
| polonius-the-crab = "0.5.0" |
There was a problem hiding this comment.
Don't need polonius-the-crab now that the caching provider is gone.
| //! [`DataProvider`](diskann::provider::DataProvider) trait, enabling indexes that can | ||
| //! transparently spill to disk for datasets larger than available memory. | ||
| pub mod provider; |
There was a problem hiding this comment.
If the only contents here are a directory, we can move away from the madness of the current inmem and flatten the hierarchy even further. What about something like this:
diskann-bftree/
src/
lib.rs
provider.rs
vectors.rs
neighbors.rs
quant.rs
And the following renames:
BfTreeProvider->Provider: Users would import asdiskann_bftree::Provider`, which IMO is completely unambiguous.NeighborProvider->Neighbors.VectorProvider->Vectors.QuantVectorProvider->QuantVectors.
There was a problem hiding this comment.
sure, i can do that.
|
|
||
| use super::super::common::TestCallCount; | ||
| use super::ConfigError; | ||
| use diskann_providers::model::graph::provider::async_::common::TestCallCount; |
There was a problem hiding this comment.
Maybe either duplicate TestCallCount or drop entirely? Alternatively, we can move this into the Context used by the bf-tree provider and always gather these metrics. It's generally pretty useful to have these things.
| return std::future::ready(Err(e)); | ||
| } | ||
| if let Err(e) = self.neighbor_provider.set_neighbors(id, &[]) { | ||
| return std::future::ready(Err(e)); |
There was a problem hiding this comment.
Would this second condition be sufficient? Otherwise, there's a transient situation where concurrent accesses could see a deleted vector, which IIRC we don't handle gracefully yet.
As an aside, we should change the NeighborAccessor traits to have ToRanked errors instead of ANNError.
There was a problem hiding this comment.
Changing the semantics of deleted vector to being deleted in the underlying BFTree means we can no longer use the Panics error. Instead, we need a true ToRanked error with a transient when accessing deleted vectors. And this needs to get propagated all the way down to the inherent access methods on the various providers to avoid overheads associated with ANNError.
| /// | ||
| pub struct HybridAccessor<'a, T, D> | ||
| // Pass-through fill — returns `&Self` which directly accesses the underlying provider. | ||
| impl<T> workingset::Fill<PassThrough> for QuantAccessor<'_, T> |
There was a problem hiding this comment.
We really really don't want to use PassThrough here. While the prune algorithm caches the result internally to avoid refetching for every pairwise distance, Map will allow vectors to be reused across multiple prunes.
| type DistanceComputer = distances::pq::HybridComputer<T>; | ||
| type QueryComputer = T::QueryDistance; | ||
| type SearchAccessor<'a> = FullAccessor<'a, T, Q>; | ||
| type SearchAccessorError = Panics; |
There was a problem hiding this comment.
I think this can be Infallible instead?
| impl<T, D> SearchStrategy<BfTreeProvider<T, QuantVectorProvider, D>, &[T]> for Hybrid | ||
| /// Hybrid strategy: search in quantized space, rerank with full-precision distances, | ||
| /// prune in quantized space. | ||
| impl<T> SearchStrategy<BfTreeProvider<T, QuantVectorProvider>, &[T]> for Hybrid |
There was a problem hiding this comment.
We can completely ditch Hybrid as a strategy, just using FullPrecision and Quantized. Quantized search should still rerank, but that's generally expected.
Migrate bf_tree provider to dedicated crate and modernize
Motivation
The long-term goal is to remove
diskann-providersentirely. The bf_tree provider was one of several components keeping that crate alive — it depended ondiskann-providersfor generic delete infrastructure (DeletionCheck/AsDeletionCheck/RemoveDeletedIdsAndCopy) and PQ quantization types. This PR extracts bf_tree into a standalonediskann-bf_treecrate, cuts those dependencies, simplifies the generics, and switches from PQ to spherical quantization — all to reduce the surface area ofdiskann-providersand move closer to removing it.What Changed
1. Crate Extraction (
bf7d33fd)Moved the bf_tree provider from
diskann-providers/src/model/graph/provider/async_/bf_tree/into a standalonediskann-bf_treecrate. Updated workspaceCargo.tomland CI configuration.2. PQ → Spherical Quantization (
74149bbc)QuantVectorProvider: StoresArc<Poly<dyn Quantizer>>(spherical) instead ofArc<FixedChunkPQTable>.QuantQueryComputer: Newtype wrapper adapting spherical'sOpaque-based API toPreprocessedDistanceFunction<&[u8], f32>.Quantizer::serialize/iface::try_deserialize(flatbuffers format).CreateQuantProvider: Now implemented forPoly<dyn Quantizer>instead ofFixedChunkPQTable.3. Remove
D(Delete Provider) Parameter (14f504e8)BfTreeProvider<T, Q, D>→BfTreeProvider<T, Q>.delete(key)API replace bitmap-based soft deletes.delete()removes from all three trees (neighbors, full_vectors, quant_vectors).status_by_internal_id()checks vector presence (Ok = valid, Err = deleted).delete_bitmap_serde.rs,NoDeletes/TableBasedDeletesfrom constructors.4. Simplify to Core Strategies, Then Re-add (
02d355fb→c598d71d)First removed the old Hybrid/QuantAccessor/hybrid_computer code to start fresh, then re-added clean implementations:
QuantAccessor(e1ac1011): New accessor that reads quantized vectors from the bf_tree and usesQuantQueryComputerfor distance. Implements allQuantizedstrategy traits.Hybridstrategies withRerank(c598d71d): Re-introducedHybridas a struct withmax_fp_vecs_per_prune: Option<usize>.Rerankpost-processor re-ranks quantized search results using full-precision vectors. ImplementsSearchPostProcess,SearchPreprocess,InsertPreprocess, andExpandBeamforHybrid.benches/directory).5. Use
CopyIdsInstead ofRemoveDeletedIdsAndCopy(8a3b6742)Since bf_tree uses hard deletes, deleted IDs never appear in neighbor lists. The
AsDeletionCheck/DeletionCheck/RemoveDeletedIdsAndCopymachinery is unnecessary. Replaced withglue::CopyIdsand removed theAsDeletionCheckimpl fromFullAccessor.6. Move
DeletionCheckTraits Back todiskann-providers(38f1e614)These traits were temporarily moved into
diskann/src/graph/glue.rsduring the extraction as a stop-gap. Now that bf_tree no longer depends on them, they are restored to their original location indiskann-providers/src/model/graph/provider/async_/postprocess.rswithpub(crate)visibility.7. Scope
diskann-providersAPI Back topub(crate)(ee595e2c)Reverted methods on
TableDeleteProviderAsync(is_deleted,delete,undelete,clear) frompubback topub(crate). Removed unusedRemoveDeletedIdsAndCopyre-export.8. Tests and Delete-Panic Fix (
cbea923f)FullAccessor::on_elements_unorderedandQuantAccessor::on_elements_unorderednow silently skip deleted entries instead of propagating hard errors. bf_tree does hard deletes but graph edges still reference deleted nodes — when search traverses those edges,get_vector_intoreturns anANNError(always-escalate), which panicked inexpand_beam. The fix skips entries that fail to read.test_quantized_index_search,test_quantized_index_multi_insert_search,test_quantized_delete_and_search,test_hybrid_index_search,test_hybrid_delete_and_search,test_full_precision_index_search,test_full_precision_delete_and_search.OwnedOpaqueand indexing strategy descriptions onBfTreeProvider.Poly<dyn Quantizer>instead ofFixedChunkPQTable.Dparameter removed, noNoDeletes/TableBasedDeletesneeded.Net Impact
diskann-bf_treeis now a self-contained crate with two generic parameters (T,Q) instead of threeFullPrecision,Quantized,Hybrid(withRerankpost-processor)DeletionCheck/AsDeletionCheck/RemoveDeletedIdsAndCopydiskann-providersAPI surface reduced (methods scoped back topub(crate))