Skip to content

fix(compaction): prune fully-deleted fragments from the fragment-reuse index#7378

Draft
xuanyu-z wants to merge 1 commit into
lance-format:mainfrom
xuanyu-z:fix/fri-prune-fully-deleted-fragment
Draft

fix(compaction): prune fully-deleted fragments from the fragment-reuse index#7378
xuanyu-z wants to merge 1 commit into
lance-format:mainfrom
xuanyu-z:fix/fri-prune-fully-deleted-fragment

Conversation

@xuanyu-z

Copy link
Copy Markdown
Contributor

Problem

Fixes #7374. A vector (or scalar) index can be left with dangling references after a deferred compaction when a fragment was fully deleted beforehand.

When delete removes every row of a fragment, the fragment is dropped from the manifest at delete time (apply_deletionsFragmentChange::Removed) — it gets no deletion file. It therefore never enters a later compaction task, so it is absent from every fragment-reuse-index (FRI) group's old_frags. With defer_index_remap=true the inline remap is skipped, and FragReuseIndex::remap_row_id returns the original address (pass-through) for any address it has no mapping for — including that orphaned fragment's rows. An index that still covers the removed fragment keeps those entries, which resurface as take received reference to fragment that does not exist errors / ghost results once the index is physically remapped through the FRI.

(The inline path is unaffected: it relies on effective_fragment_bitmap — index coverage intersected with live fragments at query time — to mask the dead fragment, and the FRI deferral preserves that data instead of pruning it.)

Fix

Record the orphaned fragment IDs on the FRI version and prune them by fragment ID in remap_row_id:

  • commit_compaction computes orphaned = (union of non-system indices' fragment_bitmap) − (fragments currently in the dataset). At that point the about-to-be-compacted fragments are still present, so only fragments removed before the compaction (e.g. fully-deleted ones) remain.
  • The set is stored on the new FragReuseVersion (new additive, backward-compatible proto field removed_fragments), and FragReuseIndex::remap_row_id returns None for any address whose fragment is in it — fixing both auto-remap-at-load and physical remap.
  • Index coverage bitmaps are intentionally left intact (still masked at query time); only the row data is pruned. Stripping coverage instead regresses the inverted-index unindexed-fallback path.

Testing

  • New regression test test_defer_index_remap_fully_deleted_fragment (10 fragments + IVF_PQ, fully delete one fragment, deferred compact_files, assert the FRI maps that fragment's addresses to None and a search returns no rows from it). Fails before the fix, passes after.
  • The full dataset::optimize suite and the frag-reuse / remapping tests pass locally.

The regression test asserts at the FRI (remap_row_id) level on purpose: an end-to-end search alone is masked by effective_fragment_bitmap and passes even with the bug, so it isn't a reliable detector.

…e index

A fragment that has all its rows deleted is removed from the manifest at
delete time, so it never enters a compaction task and is absent from every
fragment-reuse-index (FRI) group's old_frags. With deferred compaction
(defer_index_remap=true) the inline remap is skipped, so
FragReuseIndex::remap_row_id passed those addresses through unchanged
instead of pruning them. An index that still covered the removed fragment
then kept dangling references to it, surfacing as 'take received reference to
fragment that does not exist' errors / ghost results after a later physical
remap.

Record the orphaned fragment IDs (covered by an index but no longer present
in the dataset, and not part of any rewrite group) on the FRI version and
prune those addresses to None in remap_row_id, fixing both auto-remap at load
and physical remap. Index coverage bitmaps are left intact (masked at query
time by effective_fragment_bitmap); only the row data is pruned. The new proto
field is additive and backward-compatible.

Adds a regression test.

Fixes lance-format#7374
@github-actions

Copy link
Copy Markdown
Contributor

Important

This PR touches the Lance format specification.

Substantive changes to the format specification — the .proto definitions
and the spec docs under docs/src/format/ — require a PMC vote before merge.
Minor edits such as typo fixes, wording, or formatting are excluded; use your
judgment.

If this is a meaningful format change:

  • Start a vote following the Lance community voting process.
    Format specification modifications need 3 binding +1 votes (excluding the
    proposer), held on GitHub Discussions, with a minimum voting period of 1 week.
  • Once the vote passes, link the completed vote in this PR. It should not be
    merged until the vote is linked.

@github-actions github-actions Bot added bug Something isn't working A-format On-disk format: protos and format spec docs and removed bug Something isn't working labels Jun 19, 2026
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.20000% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/optimize.rs 94.28% 1 Missing and 5 partials ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-format On-disk format: protos and format spec docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector index can become corrupted when compaction is deferred

1 participant