Skip to content

fix(index): drop stale per-segment rows in vector search after in-place column update#7371

Open
wombatu-kun wants to merge 1 commit into
lance-format:mainfrom
wombatu-kun:fix/vector-index-stale-duplicate-after-column-update
Open

fix(index): drop stale per-segment rows in vector search after in-place column update#7371
wombatu-kun wants to merge 1 commit into
lance-format:mainfrom
wombatu-kun:fix/vector-index-stale-duplicate-after-column-update

Conversation

@wombatu-kun

Copy link
Copy Markdown
Contributor

Closes #7370

Problem

A KNN vector query can return an updated row twice after an in-place column update (LanceFragment.update_columns + LanceOperation.Update) followed by optimize_indices(num_indices_to_merge=0). One copy carries the stale pre-update vector (from the original index segment) and one carries the new value (from the new delta segment). The same stale segment also misranks the row for queries near its old vector.

Root cause

Index coverage is tracked per fragment via IndexMetadata.fragment_bitmap. An in-place column update keeps the fragment id and row address, and committing the Update prunes the fragment from the old segment's bitmap, but the old segment's index file still physically contains the row's old vector. A delta optimize then builds a new segment covering that fragment with the new vector while leaving the old segment in place. At query time the shared DatasetPreFilter is built from the union of all segment bitmaps, so it cannot tell "fragment N is valid for the delta segment but stale for the old segment", and there is no cross-segment row-id dedup on the vector path.

Fix

Restrict each index segment's vector-search output to the fragments it actually owns, in ANNIvfSubIndexExec (knn.rs):

  1. Per-segment post-filter: after a segment's partition search, drop rows whose fragment is not in that segment's fragment_bitmap, using a mask from the existing DatasetPreFilter::create_restricted_deletion_mask (correct for both row-address and stable-row-id datasets). This removes the stale duplicate served by the old segment.
  2. Per-owner shortcut emission: the late_search "fewer than k results" shortcut returns prefilter-matched rows the partition search did not reach. It previously emitted the whole set from the first delta. With a per-segment restriction active, each delta now emits only the not-found addresses its own segment owns (segments partition the fragments, so each is emitted exactly once and stays deterministic). Without a restriction the original first-delta-only path is unchanged.

Both are gated on every segment having a fragment_bitmap and are pass-through no-ops for normal append/merge indices.

Tests

Adds test_no_stale_duplicate_after_partial_column_update to python/python/tests/test_vector_index.py, which reproduces the multi-segment duplicate and asserts the updated row is returned exactly once (fails before the fix with two copies, passes after). The existing Rust test_fewer_than_k_results (multi-delta prefilter shortcut) and the #6877 single-segment masking guard continue to pass.

Notes

The per-segment mask is reused from create_restricted_deletion_mask, so it is None (pass-through) for normal single-segment or fully-merged indices and adds no work there.

One narrow limitation remains: the early/late-search coordinator counts a segment's rows before the post-filter drops stale ones. In the rare combination of a selective prefilter plus stale rows from an in-place update where the fresh row lives in an unsearched partition, the owning segment's shortcut may skip re-emitting it (a small recall reduction, never a duplicate or a wrong row). Removing this would require pushing the per-segment restriction into the shared, cross-delta prefilter and its shortcut, a much larger change.

@github-actions github-actions Bot added A-python Python bindings bug Something isn't working labels Jun 19, 2026
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.53846% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/io/exec/knn.rs 86.53% 5 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@Xuanwo Xuanwo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Segment restrictions are applied only after early and late vector-search accounting has already counted stale rows. Stale candidates can consume the shared search budget and leave fewer than k valid results.
  2. The commit metadata includes a disallowed co-author trailer. This violates the repository commit policy.

@wombatu-kun wombatu-kun force-pushed the fix/vector-index-stale-duplicate-after-column-update branch from cd6d6ef to 6729af7 Compare June 20, 2026 01:59
@wombatu-kun

Copy link
Copy Markdown
Contributor Author
  1. This is the narrow limitation noted in the PR description, and even in this form the change is better merged than withheld: the amount of useful information returned does not change. The set of unique, current vectors is the same as before; what the per-segment restriction removes are the stale duplicate copies (the outdated rows the old segment should no longer serve). The single downside is a rare, bounded recall dip, limited to a selective prefilter coinciding with an in-place-update stale row whose fresh copy sits in an unsearched partition, and it never produces a duplicate or a wrong row. The restriction is a post-filter, so a stale candidate can occupy a k-slot in the early/late accounting and then be dropped; eliminating even that means moving the restriction into the shared cross-delta prefilter and its shortcut accounting, a substantially larger change. I can open a follow-up issue to track it if you would prefer.
  2. Done 6729af7.

@wombatu-kun wombatu-kun requested a review from Xuanwo June 20, 2026 02:05
// Drop stale rows from this segment's search output (rows whose
// fragment the segment no longer owns). The shortcut path in
// late_search restricts its emitted rows with the same mask.
let restricted = combined.map(move |batch_res| {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path applies the segment restriction only after the shared early/late search accounting has already counted the unmasked results. A stale hit can satisfy the coordinator and then be dropped, so the query can return fewer than k valid rows or miss the fresh row from the owning segment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-python Python bindings bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector search returns an updated row twice after update_columns + optimize_indices

2 participants