Skip to content

[PQ Cleanup] Part 2: Consolidate calculate_chunk_offsets* #976

Merged
arkrishn94 merged 17 commits into
mainfrom
u/adkrishnan/pq-cleanup-3
May 11, 2026
Merged

[PQ Cleanup] Part 2: Consolidate calculate_chunk_offsets* #976
arkrishn94 merged 17 commits into
mainfrom
u/adkrishnan/pq-cleanup-3

Conversation

@arkrishn94
Copy link
Copy Markdown
Contributor

@arkrishn94 arkrishn94 commented Apr 24, 2026

.Couple of small independent cleanups for PQ.

Move calculate_chunk_offsets[_auto] to ChunkOffsets as a constructor.

These two functions are pure prefix-sum math over dimensions and num_pq_chunks. ChunkOffsetsBase / ChunkOffsetsView in diskann-quantization::views is the natural home for it. I've moved the logic into two constructors - from_dim and from_dim_into for ChunkOffsets and ChunkOffsetsView respectively to support both allocation patterns.

All in-repo call sites have been updated. There might be some overlapping edits with #1010.

Minor changes

  • QueryComputer and DistanceComputer had six trampoline impls forwarding &Vec<u8> and &&[u8] arguments to the canonical &[u8] impls. ElementRef in the accessor now allows us to get rid of these! This might be conflicting with Remove unnecessary distance function implementations. #1008
  • Move get_chunk_from_training_data in pq_construction.rs from public API into tests where it is used.

Relocates the object pool module so that it is available to crates that depend on diskann-utils but not diskann (notably diskann-quantization, which will gain pool-aware distance-table allocation in a follow-up). diskann::utils::object_pool stays as a re-export for backwards compatibility.

Direct importers in diskann-providers, diskann-disk, and diskann-garnet are switched to use diskann_utils::object_pool directly. Internal diskann users continue to use the re-export.
@arkrishn94 arkrishn94 requested review from a team and Copilot April 24, 2026 18:50
@arkrishn94 arkrishn94 changed the title {PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls and [PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls and Apr 24, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Relocates PQ-related chunk offset helpers into diskann-quantization, centralizes the cosine→L2 fallback rationale for disk PQ preprocessing, and removes redundant distance trampoline impls by leaning on accessor element refs.

Changes:

  • Moved calculate_chunk_offsets[_auto] from diskann-providers into diskann-quantization::views and updated call sites.
  • Migrated object_pool usage to diskann-utils::object_pool across crates and removed diskann::utils::object_pool.
  • Deduplicated cosine/L2 fallback commentary and removed redundant distance impls.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
diskann/src/utils/mod.rs Removes utils::object_pool module exposure.
diskann/src/graph/search/scratch.rs Switches AsPooled import to diskann_utils.
diskann/src/graph/index.rs Switches ObjectPool/PooledRef import to diskann_utils.
diskann-utils/src/lib.rs Exposes object_pool module publicly from diskann-utils.
diskann-quantization/src/views.rs Adds calculate_chunk_offsets[_auto] helpers.
diskann-providers/src/model/pq/pq_construction.rs Removes local chunk-offset helpers and imports from quantization crate.
diskann-providers/src/model/pq/mod.rs Stops re-exporting chunk-offset helpers from providers PQ module.
diskann-providers/src/model/pq/distance/test_utils.rs Updates import path for calculate_chunk_offsets_auto.
diskann-providers/src/model/pq/distance/l2.rs Switches object pool import to diskann_utils.
diskann-providers/src/model/pq/distance/innerproduct.rs Switches object pool import to diskann_utils.
diskann-providers/src/model/pq/distance/dynamic.rs Switches object pool import and removes redundant trait impls.
diskann-providers/src/model/pq/distance/common.rs Switches object pool import to diskann_utils.
diskann-providers/src/model/mod.rs Removes re-export of calculate_chunk_offsets_auto.
diskann-providers/src/model/graph/provider/async_/memory_quant_vector_provider.rs Switches object pool import to diskann_utils.
diskann-providers/src/model/graph/provider/async_/fast_memory_quant_vector_provider.rs Switches object pool import and updates tests for new argument types.
diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs Switches object pool import to diskann_utils.
diskann-garnet/src/provider.rs Switches object pool imports to diskann_utils.
diskann-disk/src/search/provider/disk_provider.rs Switches object pool imports to diskann_utils.
diskann-disk/src/search/pq/quantizer_preprocess.rs Centralizes cosine→L2 fallback rationale into module docs.
diskann-benchmark/src/backend/exhaustive/product.rs Updates chunk-offset helper call site to diskann-quantization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-quantization/src/views.rs Outdated
Comment thread diskann/src/utils/mod.rs
Comment thread diskann-providers/src/model/pq/mod.rs
Comment thread diskann-disk/src/search/pq/quantizer_preprocess.rs Outdated
Comment thread diskann-quantization/src/views.rs Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 93.27731% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.47%. Comparing base (65857ad) to head (84e97e4).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
diskann-providers/src/model/pq/views.rs 0.00% 6 Missing ⚠️
diskann-quantization/src/views.rs 97.43% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #976      +/-   ##
==========================================
- Coverage   89.55%   89.47%   -0.08%     
==========================================
  Files         459      461       +2     
  Lines       85006    85559     +553     
==========================================
+ Hits        76131    76558     +427     
- Misses       8875     9001     +126     
Flag Coverage Δ
miri 89.47% <93.27%> (-0.08%) ⬇️
unittests 89.32% <93.27%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...iskann-benchmark/src/backend/exhaustive/product.rs 100.00% <ø> (ø)
...ovider/async_/fast_memory_quant_vector_provider.rs 98.46% <100.00%> (-0.01%) ⬇️
diskann-providers/src/model/pq/distance/dynamic.rs 94.11% <ø> (+2.81%) ⬆️
...kann-providers/src/model/pq/distance/test_utils.rs 100.00% <100.00%> (ø)
diskann-providers/src/model/pq/pq_construction.rs 93.73% <100.00%> (-0.18%) ⬇️
diskann-quantization/src/views.rs 98.29% <97.43%> (-0.32%) ⬇️
diskann-providers/src/model/pq/views.rs 89.09% <0.00%> (-10.91%) ⬇️

... and 43 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arkrishn94 arkrishn94 changed the title [PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls and [PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls Apr 25, 2026
Comment thread diskann-utils/src/views.rs Outdated
@arkrishn94 arkrishn94 changed the title [PQ Cleanup] Part 2: Relocate calculate_chunk_offsets* and remove redundant distance impls [PQ Cleanup] Part 2: Consolidate calculate_chunk_offsets* and accum_row_inplace May 4, 2026
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out you updated the code midway through the review 😄

That said, I think my comments still hold.

Comment thread diskann-quantization/src/views.rs Outdated
Comment thread diskann-quantization/src/views.rs
Comment thread diskann-utils/src/views.rs Outdated
@arkrishn94 arkrishn94 requested a review from a team May 5, 2026 13:59
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just one more quick comment about the error type for the new constructors then looks fine to me. I also still prefer partition/partition_in for the function names since it's partitioning dim into the requested number of chunks.

Comment thread diskann-quantization/src/views.rs Outdated
Comment thread diskann-quantization/src/views.rs
Comment thread diskann-providers/src/model/pq/mod.rs
@arkrishn94 arkrishn94 enabled auto-merge (squash) May 11, 2026 13:46
@arkrishn94 arkrishn94 changed the title [PQ Cleanup] Part 2: Consolidate calculate_chunk_offsets* and accum_row_inplace [PQ Cleanup] Part 2: Consolidate calculate_chunk_offsets* May 11, 2026
@arkrishn94 arkrishn94 merged commit d516da1 into main May 11, 2026
26 checks passed
@arkrishn94 arkrishn94 deleted the u/adkrishnan/pq-cleanup-3 branch May 11, 2026 13:47
hildebrandmw added a commit that referenced this pull request May 12, 2026
# DiskANN v0.52.0 Release Notes

## Breaking Changes

An AI generated, human reviewed list of changes is summarized below.

### `get_degree_stats` signature changed
([#998](#998))

`DiskANNIndex::get_degree_stats` now takes an explicit iterator of IDs
instead of requiring the data provider to implement `IntoIterator`.

```rust
// Before — provider had to impl IntoIterator
index.get_degree_stats(&mut accessor)?;

// After — caller supplies the ID iterator
index.get_degree_stats(&mut accessor, id_iter)?;
```

### PQ dimension contract tightened; entries now `&[f32]` only
([#1044](#1044))

With `AlignedBoxWithSlice` removed from the PQ path, the dimension
handling has been refactored into a three-layer contract:

| Layer | Where | Contract |
|---|---|---|
| **Boundary (inmem)** | `QueryComputer::new`,
`MultiQueryComputer::new`, `DistanceComputer::evaluate_similarity` |
`len == dim` (returns `Err` on mismatch) |
| **Boundary (disk)** | `PQScratch::set` | `len >= dim`, slices to
`[..dim]` |
| **Internal** | `TableL2/IP/Cosine::{new, populate}` | Trusted — no
re-validation |

**Other changes:**
- PQ table populate/distance methods now accept `&[f32]` instead of `<U:
Into<f32>>`. Callers must pre-decode quantized vectors via
`VectorRepr::as_f32`.
- Generic trampoline impls (`&Vec<u8>`, `&&[u8]`) on `QueryComputer` /
`DistanceComputer` have been removed.
### `calculate_chunk_offsets` relocated to `ChunkOffsets` constructors
([#976](#976))

The free functions `calculate_chunk_offsets` and
`calculate_chunk_offsets_auto` have been moved into constructors on
`ChunkOffsets` / `ChunkOffsetsView` in `diskann-quantization::views`.

```rust
// Before
let offsets = calculate_chunk_offsets(dim, num_chunks);

// After (allocating)
let offsets = ChunkOffsets::partition(dim, num_chunks)?;

// After (zero-alloc, borrows caller-owned scratch)
let view = ChunkOffsetsView::partition_into(dim, &mut scratch)?;
```

Additionally, `get_chunk_from_training_data` has been moved from public
API.

### `CachingProvider` removed
([#1052](#1052))

The entire `diskann_providers::model::graph::provider::async_::caching`
module has been deleted.

**Why:** The `CachingProvider` was an experiment in transparent caching
over `DataProvider`. In practice it required double monomorphization of
the indexing code, didn't save integration work for bulk methods like
`on_elements_unordered`/`distances_unordered`, and was complex to
maintain. An internal user who …migrated off it removed ~1,000 lines of
code, improved compile times by ~20%, and substantially reduced
complexity.

**Upgrade:** Manage caching directly in your `DataProvider`
implementation.

## New Features

### AVX-512 4-bit distance kernels
([#1045](#1045))

Native V4 (AVX-512) specializations for 4-bit packed vector distance
computations:

- **`SquaredL2`** — 16 × `u32` lanes per iteration via
`_mm512_madd_epi16`.
- **`InnerProduct`** — AVX-512 VNNI (`_mm512_dpbusd_epi32`) over `u8x64`
/ `i8x64` operands.

Previously, V4 hardware fell back to two AVX2 (V3) kernel invocations
per 512-bit chunk. The native kernels double per-instruction throughput.
No API changes — existing code benefits automatically on AVX-512 capable
hardware.

## Merged PRs
* Deprecate 32-bit targets by @suhasjs in
#1022
* Add a fast path to `Map::prepare`. by @hildebrandmw in
#1023
* Add boundary checks in gen_associated_data_from_range() by @Copilot in
#847
* [deps] Don't pull `rayon` as a dependency of `diskann`. by
@hildebrandmw in #1024
* Bump openssl from 0.10.78 to 0.10.79 by @dependabot[bot] in
#1026
* Cleaning up test work and changing the get_degree_stats signature. by
@JordanMaples in #998
* Reduce scalar-quantization benchmark monomorphization by
@suri-kumkaran in #1041
* [diskann-vector] Support truly unaligned distances. by @hildebrandmw
in #981
* rename spherical.json to graph index with spherical quantization by
@harsha-simhadri in #1042
* [PQ Cleanup] Part 2: Consolidate `calculate_chunk_offsets*` by
@arkrishn94 in #976
* PQ: tighten dim contract; right-size scratch buffer by @wuw92 in
#1044
* Add v4 distance kernels (4-bit SquaredL2 / InnerProduct) by @m3hm3t in
#1045
* Remove the Caching Provider by @hildebrandmw in
#1052

## New Contributors
* @suhasjs made their first contribution in
#1022
* @m3hm3t made their first contribution in
#1045

**Full Changelog**:
v0.51.0...v0.52.0

Co-authored-by: Mark Hildebrand <mhildebrand@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants