Skip to content

perf(index): add exact null-row bitmap to zone map and bloom filter#7372

Open
westonpace wants to merge 1 commit into
lance-format:mainfrom
westonpace:feat-expression-index
Open

perf(index): add exact null-row bitmap to zone map and bloom filter#7372
westonpace wants to merge 1 commit into
lance-format:mainfrom
westonpace:feat-expression-index

Conversation

@westonpace

Copy link
Copy Markdown
Member

The queries X IS NULL and X IS NOT NULL are very common queries. Because validity is scattered throughout a file it is not actually that easy to answer this query without scanning the column. This means very wide columns can be very slow. Very wide columns also tend to be columns that are not good candidates for btree (too much duplicated data) or bitmap (too much cardinality).

On the flip side, a validity bitmap is pretty small, and maintaining one for each column is probably affordable, even for the lightweight zone map and bloom filter indexes. It would be nice to have consistent IS NULL speedup when a column is indexed, regardless of the index type.


Captures a RowAddrTreeMap of every null row address during index training. IS NULL queries can now be answered in O(1) by returning SearchResult::exact(null_rows) instead of scanning all zones, which also eliminates the downstream recheck step.

Backward-compatible: older indexes without the null_bitmap.lance file load with null_rows = None and fall back to the previous zone-scan path.

@github-actions github-actions Bot added A-index Vector index, linalg, tokenizer performance labels Jun 19, 2026
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.13300% with 18 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/bloomfilter.rs 88.88% 2 Missing and 7 partials ⚠️
rust/lance-index/src/scalar/zonemap.rs 91.75% 2 Missing and 6 partials ⚠️
rust/lance-index/src/scalar/zoned.rs 96.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@westonpace westonpace force-pushed the feat-expression-index branch 4 times, most recently from 3aa6ced to 9580d80 Compare June 19, 2026 14:41
…ndexes

Captures a `RowAddrTreeMap` of every null row address during index
training.  IS NULL queries can now be answered in O(1) by returning
`SearchResult::exact(null_rows)` instead of scanning all zones, which
also eliminates the downstream recheck step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@westonpace westonpace force-pushed the feat-expression-index branch from 9580d80 to 99d03e0 Compare June 19, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant