Skip to content

bug: index out of bounds panic in inverted index builder when token_id > posting_lists.len() #7313

@sinianluoye

Description

@sinianluoye

Description

IndexWorker::process_batch() in rust/lance-index/src/scalar/inverted/builder.rs panics with index out of bounds during table.optimize() when the FTS inverted index is rebuilt with position tracking enabled (with_position: true).

Root Cause

In the position-tracking branch of process_batch, tokens.add() returns a token ID, then the code accesses posting_lists[token_id]. The posting_lists vector is resized only when token_id == posting_lists.len(), but tokens.add() can return an ID greater than the current length (e.g. when the token set was pre-populated from an existing index partition):

// Line 1325 in current main (line 856 in version at commit d88dde4):
let token_id = builder.tokens.add(token_text);
if token_id as usize == builder.posting_lists.len() {  // BUG: should be >=
    builder.posting_lists.push(
        PostingListBuilder::new_with_posting_tail_codec(true, posting_tail_codec),
    );
}
let posting_list = &mut builder.posting_lists[token_id as usize];  // PANICS here

The non-position branch (below) correctly handles this by calling posting_lists.resize_with(tokens.len(), ...) before accessing by index.

Reproduction

Observed on three LanceDB tables during table.optimize():

Table posting_lists.len() token_id Operation
conversations 1731 4456 optimize
cortex_engrams 219 1663 optimize
cortex_entities 2398 2719 optimize

All three tables have FTS indexes with with_position: true (the default). The panic triggers when an existing token's ID exceeds the current posting_lists capacity because the token set was loaded from an old index partition during optimize.

Fix

Change == to >= on the resize guard:

- if token_id as usize == builder.posting_lists.len() {
+ if token_id as usize >= builder.posting_lists.len() {

This matches the semantics used by the non-position branch which also handles the gap case.

Environment

  • lance commit: d88dde4
  • Platform: Linux x86_64, Node.js 22

Panic Output

thread 'tokio-rt-worker' panicked at .../lance-index/src/scalar/inverted/builder.rs:856:57:
index out of bounds: the len is 1731 but the index is 4456

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions