fix: compute SQ dot distance from dequantized values by cwj0bzxg · Pull Request #7355 · lance-format/lance

cwj0bzxg · 2026-06-18T08:33:51Z

This PR fixes Dot distance computation for scalar-quantized vectors. #7352

Previously, the SQ Dot path computed the dot product directly from u8 quantized codes and only applied a scale factor. This is incorrect when SQ uses a non-zero lower_bound, because each code represents an offset value:

value ≈ lower_bound + step * code

As a result, the old Dot path missed the offset-related terms and could produce a very different ranking from the actual vector values. This caused severe recall degradation for SQ indexes with metric="dot".

The fix computes SQ Dot using the full expansion:

dot(x, q) ≈ sum_i((lower_bound + step * cx_i) * (lower_bound + step * cq_i))

Equivalently:

dot =
    step² · sum_i(cx_i · cq_i)
  + lower_bound·step · sum_i(cx_i)
  + lower_bound·step · sum_i(cq_i)
  + dim·lower_bound²

distance = 1 - dot is then used for Dot distance, matching the existing Dot distance convention.

Changes

Fix SQ Dot distance to include the lower_bound offset terms.
Keep the existing SQ L2 and Cosine paths unchanged.
Cache / compute the SQ query code sum needed by the Dot formula.
Add Rust unit tests for:
- Dot distance from a float query.
- Dot distance from an indexed vector id.
- Constant-bound SQ behavior.

Validation

I rebuilt the Python extension with this patch and reran MSMARCO WebSearch 1M Dot benchmarks.

Dataset:

1M base vectors
9,376 queries
dimension 768
metric: Dot
evaluated with recall@10

Before this fix, IVF_HNSW_SQ recall@10 was only around 0.0250 to 0.0684 across ef=20..640, while the IVF_HNSW_FLAT baseline reached 0.5179 to 0.9377.

After this fix:

IVF_HNSW_SQ

ef	QPS	recall@10
20	944.54	0.5176
40	880.87	0.6455
80	762.09	0.7448
160	607.29	0.8198
320	453.09	0.8716
640	305.57	0.9040

IVF_SQ

nprobes	QPS	recall@10
16	578.43	0.6392
32	417.61	0.7352
64	272.83	0.8104
96	205.88	0.8456
128	165.96	0.8684

IVF_HNSW_FLAT (baseline)

ef	QPS	recall@10
20	886.27	0.5195
40	773.42	0.6507
80	627.24	0.7563
160	470.82	0.8390
320	315.20	0.8983
640	193.39	0.9373

With the corrected distance formula, SQ Dot recall is restored to the same range as the Flat baseline. IVF_HNSW_SQ is close to IVF_HNSW_FLAT at the same ef, while generally providing higher QPS. The remaining recall gap at high ef is expected from quantization loss rather than an incorrect distance formula.

Xuanwo · 2026-06-19T18:45:44Z

+    lower_bound: f32,
+) -> f32 {
+    let code_dot = dot_u8(sq_code, query_sq_code) as f32;
+    let dot = step * step * code_dot


This expanded affine dot calculation is performed in f32, and the large offset terms can cancel in high-dimensional near-zero vectors enough to flip SQ Dot rankings.

Xuanwo · 2026-06-19T18:45:44Z

+            ],
+        );
+        let storage =
+            ScalarQuantizationStorage::try_new(8, DistanceType::Dot, -10.0..245.0, [batch], None)


The new coverage only exercises unit-step and constant bounds, so regressions in the step and step * step terms can pass while arbitrary-range SQ Dot distances remain wrong.

fix: compute SQ dot distance from dequantized values

3538dfb

github-actions Bot added A-index Vector index, linalg, tokenizer bug Something isn't working labels Jun 18, 2026

Xuanwo reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: compute SQ dot distance from dequantized values#7355

fix: compute SQ dot distance from dequantized values#7355
cwj0bzxg wants to merge 1 commit into
lance-format:mainfrom
cwj0bzxg:fix-sq-dot-dequantized

cwj0bzxg commented Jun 18, 2026

Uh oh!

Xuanwo Jun 19, 2026

Uh oh!

Xuanwo Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cwj0bzxg commented Jun 18, 2026

Changes

Validation

IVF_HNSW_SQ

IVF_SQ

IVF_HNSW_FLAT (baseline)

Uh oh!

Xuanwo Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Xuanwo Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants