[PQ Cleanup] Part 2: Consolidate calculate_chunk_offsets* #976
Conversation
Relocates the object pool module so that it is available to crates that depend on diskann-utils but not diskann (notably diskann-quantization, which will gain pool-aware distance-table allocation in a follow-up). diskann::utils::object_pool stays as a re-export for backwards compatibility. Direct importers in diskann-providers, diskann-disk, and diskann-garnet are switched to use diskann_utils::object_pool directly. Internal diskann users continue to use the re-export.
calculate_chunk_offsets* and remove redundant distance impls and calculate_chunk_offsets* and remove redundant distance impls and
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Relocates PQ-related chunk offset helpers into diskann-quantization, centralizes the cosine→L2 fallback rationale for disk PQ preprocessing, and removes redundant distance trampoline impls by leaning on accessor element refs.
Changes:
- Moved
calculate_chunk_offsets[_auto]fromdiskann-providersintodiskann-quantization::viewsand updated call sites. - Migrated
object_poolusage todiskann-utils::object_poolacross crates and removeddiskann::utils::object_pool. - Deduplicated cosine/L2 fallback commentary and removed redundant distance impls.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann/src/utils/mod.rs | Removes utils::object_pool module exposure. |
| diskann/src/graph/search/scratch.rs | Switches AsPooled import to diskann_utils. |
| diskann/src/graph/index.rs | Switches ObjectPool/PooledRef import to diskann_utils. |
| diskann-utils/src/lib.rs | Exposes object_pool module publicly from diskann-utils. |
| diskann-quantization/src/views.rs | Adds calculate_chunk_offsets[_auto] helpers. |
| diskann-providers/src/model/pq/pq_construction.rs | Removes local chunk-offset helpers and imports from quantization crate. |
| diskann-providers/src/model/pq/mod.rs | Stops re-exporting chunk-offset helpers from providers PQ module. |
| diskann-providers/src/model/pq/distance/test_utils.rs | Updates import path for calculate_chunk_offsets_auto. |
| diskann-providers/src/model/pq/distance/l2.rs | Switches object pool import to diskann_utils. |
| diskann-providers/src/model/pq/distance/innerproduct.rs | Switches object pool import to diskann_utils. |
| diskann-providers/src/model/pq/distance/dynamic.rs | Switches object pool import and removes redundant trait impls. |
| diskann-providers/src/model/pq/distance/common.rs | Switches object pool import to diskann_utils. |
| diskann-providers/src/model/mod.rs | Removes re-export of calculate_chunk_offsets_auto. |
| diskann-providers/src/model/graph/provider/async_/memory_quant_vector_provider.rs | Switches object pool import to diskann_utils. |
| diskann-providers/src/model/graph/provider/async_/fast_memory_quant_vector_provider.rs | Switches object pool import and updates tests for new argument types. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs | Switches object pool import to diskann_utils. |
| diskann-garnet/src/provider.rs | Switches object pool imports to diskann_utils. |
| diskann-disk/src/search/provider/disk_provider.rs | Switches object pool imports to diskann_utils. |
| diskann-disk/src/search/pq/quantizer_preprocess.rs | Centralizes cosine→L2 fallback rationale into module docs. |
| diskann-benchmark/src/backend/exhaustive/product.rs | Updates chunk-offset helper call site to diskann-quantization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #976 +/- ##
==========================================
- Coverage 89.55% 89.47% -0.08%
==========================================
Files 459 461 +2
Lines 85006 85559 +553
==========================================
+ Hits 76131 76558 +427
- Misses 8875 9001 +126
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…ng_data to test helper
calculate_chunk_offsets* and remove redundant distance impls and calculate_chunk_offsets* and remove redundant distance impls
…up-3 # Conflicts: # diskann-disk/src/storage/quant/pq/pq_generation.rs
calculate_chunk_offsets* and remove redundant distance implscalculate_chunk_offsets* and accum_row_inplace
hildebrandmw
left a comment
There was a problem hiding this comment.
Turns out you updated the code midway through the review 😄
That said, I think my comments still hold.
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks! Just one more quick comment about the error type for the new constructors then looks fine to me. I also still prefer partition/partition_in for the function names since it's partitioning dim into the requested number of chunks.
calculate_chunk_offsets* and accum_row_inplace calculate_chunk_offsets*
# DiskANN v0.52.0 Release Notes ## Breaking Changes An AI generated, human reviewed list of changes is summarized below. ### `get_degree_stats` signature changed ([#998](#998)) `DiskANNIndex::get_degree_stats` now takes an explicit iterator of IDs instead of requiring the data provider to implement `IntoIterator`. ```rust // Before — provider had to impl IntoIterator index.get_degree_stats(&mut accessor)?; // After — caller supplies the ID iterator index.get_degree_stats(&mut accessor, id_iter)?; ``` ### PQ dimension contract tightened; entries now `&[f32]` only ([#1044](#1044)) With `AlignedBoxWithSlice` removed from the PQ path, the dimension handling has been refactored into a three-layer contract: | Layer | Where | Contract | |---|---|---| | **Boundary (inmem)** | `QueryComputer::new`, `MultiQueryComputer::new`, `DistanceComputer::evaluate_similarity` | `len == dim` (returns `Err` on mismatch) | | **Boundary (disk)** | `PQScratch::set` | `len >= dim`, slices to `[..dim]` | | **Internal** | `TableL2/IP/Cosine::{new, populate}` | Trusted — no re-validation | **Other changes:** - PQ table populate/distance methods now accept `&[f32]` instead of `<U: Into<f32>>`. Callers must pre-decode quantized vectors via `VectorRepr::as_f32`. - Generic trampoline impls (`&Vec<u8>`, `&&[u8]`) on `QueryComputer` / `DistanceComputer` have been removed. ### `calculate_chunk_offsets` relocated to `ChunkOffsets` constructors ([#976](#976)) The free functions `calculate_chunk_offsets` and `calculate_chunk_offsets_auto` have been moved into constructors on `ChunkOffsets` / `ChunkOffsetsView` in `diskann-quantization::views`. ```rust // Before let offsets = calculate_chunk_offsets(dim, num_chunks); // After (allocating) let offsets = ChunkOffsets::partition(dim, num_chunks)?; // After (zero-alloc, borrows caller-owned scratch) let view = ChunkOffsetsView::partition_into(dim, &mut scratch)?; ``` Additionally, `get_chunk_from_training_data` has been moved from public API. ### `CachingProvider` removed ([#1052](#1052)) The entire `diskann_providers::model::graph::provider::async_::caching` module has been deleted. **Why:** The `CachingProvider` was an experiment in transparent caching over `DataProvider`. In practice it required double monomorphization of the indexing code, didn't save integration work for bulk methods like `on_elements_unordered`/`distances_unordered`, and was complex to maintain. An internal user who …migrated off it removed ~1,000 lines of code, improved compile times by ~20%, and substantially reduced complexity. **Upgrade:** Manage caching directly in your `DataProvider` implementation. ## New Features ### AVX-512 4-bit distance kernels ([#1045](#1045)) Native V4 (AVX-512) specializations for 4-bit packed vector distance computations: - **`SquaredL2`** — 16 × `u32` lanes per iteration via `_mm512_madd_epi16`. - **`InnerProduct`** — AVX-512 VNNI (`_mm512_dpbusd_epi32`) over `u8x64` / `i8x64` operands. Previously, V4 hardware fell back to two AVX2 (V3) kernel invocations per 512-bit chunk. The native kernels double per-instruction throughput. No API changes — existing code benefits automatically on AVX-512 capable hardware. ## Merged PRs * Deprecate 32-bit targets by @suhasjs in #1022 * Add a fast path to `Map::prepare`. by @hildebrandmw in #1023 * Add boundary checks in gen_associated_data_from_range() by @Copilot in #847 * [deps] Don't pull `rayon` as a dependency of `diskann`. by @hildebrandmw in #1024 * Bump openssl from 0.10.78 to 0.10.79 by @dependabot[bot] in #1026 * Cleaning up test work and changing the get_degree_stats signature. by @JordanMaples in #998 * Reduce scalar-quantization benchmark monomorphization by @suri-kumkaran in #1041 * [diskann-vector] Support truly unaligned distances. by @hildebrandmw in #981 * rename spherical.json to graph index with spherical quantization by @harsha-simhadri in #1042 * [PQ Cleanup] Part 2: Consolidate `calculate_chunk_offsets*` by @arkrishn94 in #976 * PQ: tighten dim contract; right-size scratch buffer by @wuw92 in #1044 * Add v4 distance kernels (4-bit SquaredL2 / InnerProduct) by @m3hm3t in #1045 * Remove the Caching Provider by @hildebrandmw in #1052 ## New Contributors * @suhasjs made their first contribution in #1022 * @m3hm3t made their first contribution in #1045 **Full Changelog**: v0.51.0...v0.52.0 Co-authored-by: Mark Hildebrand <mhildebrand@microsoft.com>
.Couple of small independent cleanups for PQ.
Move
calculate_chunk_offsets[_auto]toChunkOffsetsas a constructor.These two functions are pure prefix-sum math over
dimensionsandnum_pq_chunks.ChunkOffsetsBase/ChunkOffsetsViewindiskann-quantization::viewsis the natural home for it. I've moved the logic into two constructors -from_dimandfrom_dim_intoforChunkOffsetsandChunkOffsetsViewrespectively to support both allocation patterns.All in-repo call sites have been updated. There might be some overlapping edits with #1010.
Minor changes
QueryComputerandDistanceComputerhad six trampoline impls forwarding&Vec<u8>and&&[u8]arguments to the canonical&[u8]impls.ElementRefin the accessor now allows us to get rid of these! This might be conflicting with Remove unnecessary distance function implementations. #1008get_chunk_from_training_datain pq_construction.rs from public API into tests where it is used.