Skip to content

feat: support COUNT(*) pushdown on stable row id datasets#7360

Open
wkalt wants to merge 2 commits into
lance-format:mainfrom
wkalt:ticket/gen-641/count-pushdown-stable-row-ids
Open

feat: support COUNT(*) pushdown on stable row id datasets#7360
wkalt wants to merge 2 commits into
lance-format:mainfrom
wkalt:ticket/gen-641/count-pushdown-stable-row-ids

Conversation

@wkalt

@wkalt wkalt commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What

COUNT(*) pushdown (the count_pushdown rule -> CountFromMaskExec) was
disabled on datasets using stable row ids -- the count fell back to the regular
scan path. This enables it.

Why it was off, and the fix

The fast path intersects the scalar-index prefilter and the deletion mask (both
in stable-id space) with a fragments-allow universe -- but that universe was
built in row-address space. ANDing across the two id spaces silently dropped
rows in fragments > 0, so the rule was gated off entirely under stable row ids.

This builds the universe in stable-id space instead, via the live-id deletion
mask (restricted to the covered fragments). For an unfiltered count there is no
prefilter, so the universe is never materialized -- the answer comes straight
from fragment metadata.

Benchmark

New count_pushdown bench (synthetic: 5M rows, 50 fragments, ~1% scattered
deletions, BTree on the filter column). cargo bench -p lance --bench count_pushdown,
this branch vs main:

benchmark before after speedup
count_unfiltered 44.5 ms 69 us ~640x
count_filtered_1pct 1.53 ms 258 us ~5.9x
count_filtered_50pct 28.3 ms 5.5 ms ~5.1x

The filtered cases share the index-scan cost with the old path; the win there is
skipping materialization + counting of the matched rows.

Tests

Stable-id coverage in count_pushdown and count_from_mask, plus an
end-to-end indexed-filter count in dataset_aggregate (--features substrait).

wkalt and others added 2 commits June 18, 2026 08:16
Benchmarks COUNT(*) via the scanner aggregate plan on a synthetic
stable-row-id dataset (multiple fragments, scattered cross-fragment
deletions, BTree scalar index on the filter column), covering unfiltered
and filtered counts at two selectivities. Uses only public APIs, so it
runs on any revision for before/after comparison.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The COUNT(*) fast path (CountFromMaskExec) previously refused to fire on
datasets using stable row ids, so such counts fell back to the regular
scan path -- a full scan when unfiltered, or an index-prefiltered scan
plus row materialization when filtered. The cause was a coordinate-space
mismatch: the index prefilter and the deletion mask are both expressed in
stable-id space, but the exec built its fragments-allow universe in
row-address space. ANDing across the two silently dropped rows in
fragments > 0, so the rule was gated off for stable row ids.

Build the universe in stable-id space instead. create_restricted_deletion_mask
already returns a live-id allow list restricted to the covered fragments;
it returns None only when there are no deletions and full coverage, in
which case the universe is loaded from the covered fragments' row-id
sequences (metadata, not column data).

For an unfiltered count (no prefilter) the universe is never needed -- the
answer is just the live row count of the covered fragments, taken straight
from fragment metadata via count_live_rows. Materializing the full stable-id
universe (every row id in the dataset) just to take its length would be far
more expensive than the answer. The default row-address path is unchanged.

Remove the gate in count_pushdown and cover the stable-id path with tests:
firing with/without an indexed filter, cross-fragment deletions, and an
end-to-end indexed-filter count.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the enhancement New feature or request label Jun 18, 2026
@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 74.76636% with 27 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/io/exec/count_from_mask.rs 58.73% 23 Missing and 3 partials ⚠️
rust/lance/src/io/exec/count_pushdown.rs 97.72% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant