feat: support COUNT(*) pushdown on stable row id datasets by wkalt · Pull Request #7360 · lance-format/lance

wkalt · 2026-06-18T15:19:01Z

What

COUNT(*) pushdown (the count_pushdown rule -> CountFromMaskExec) was
disabled on datasets using stable row ids -- the count fell back to the regular
scan path. This enables it.

Why it was off, and the fix

The fast path intersects the scalar-index prefilter and the deletion mask (both
in stable-id space) with a fragments-allow universe -- but that universe was
built in row-address space. ANDing across the two id spaces silently dropped
rows in fragments > 0, so the rule was gated off entirely under stable row ids.

This builds the universe in stable-id space instead, via the live-id deletion
mask (restricted to the covered fragments). For an unfiltered count there is no
prefilter, so the universe is never materialized -- the answer comes straight
from fragment metadata.

Benchmark

New count_pushdown bench (synthetic: 5M rows, 50 fragments, ~1% scattered
deletions, BTree on the filter column). cargo bench -p lance --bench count_pushdown,
this branch vs main:

benchmark	before	after	speedup
`count_unfiltered`	44.5 ms	69 us	~640x
`count_filtered_1pct`	1.53 ms	258 us	~5.9x
`count_filtered_50pct`	28.3 ms	5.5 ms	~5.1x

The filtered cases share the index-scan cost with the old path; the win there is
skipping materialization + counting of the matched rows.

Tests

Stable-id coverage in count_pushdown and count_from_mask, plus an
end-to-end indexed-filter count in dataset_aggregate (--features substrait).

Benchmarks COUNT(*) via the scanner aggregate plan on a synthetic stable-row-id dataset (multiple fragments, scattered cross-fragment deletions, BTree scalar index on the filter column), covering unfiltered and filtered counts at two selectivities. Uses only public APIs, so it runs on any revision for before/after comparison. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The COUNT(*) fast path (CountFromMaskExec) previously refused to fire on datasets using stable row ids, so such counts fell back to the regular scan path -- a full scan when unfiltered, or an index-prefiltered scan plus row materialization when filtered. The cause was a coordinate-space mismatch: the index prefilter and the deletion mask are both expressed in stable-id space, but the exec built its fragments-allow universe in row-address space. ANDing across the two silently dropped rows in fragments > 0, so the rule was gated off for stable row ids. Build the universe in stable-id space instead. create_restricted_deletion_mask already returns a live-id allow list restricted to the covered fragments; it returns None only when there are no deletions and full coverage, in which case the universe is loaded from the covered fragments' row-id sequences (metadata, not column data). For an unfiltered count (no prefilter) the universe is never needed -- the answer is just the live row count of the covered fragments, taken straight from fragment metadata via count_live_rows. Materializing the full stable-id universe (every row id in the dataset) just to take its length would be far more expensive than the answer. The default row-address path is unchanged. Remove the gate in count_pushdown and cover the stable-id path with tests: firing with/without an indexed filter, cross-fragment deletions, and an end-to-end indexed-filter count. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codecov · 2026-06-18T15:58:23Z

Codecov Report

❌ Patch coverage is 74.76636% with 27 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance/src/io/exec/count_from_mask.rs	58.73%	23 Missing and 3 partials ⚠️
rust/lance/src/io/exec/count_pushdown.rs	97.72%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

wkalt and others added 2 commits June 18, 2026 08:16

github-actions Bot added the enhancement New feature or request label Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support COUNT(*) pushdown on stable row id datasets#7360

feat: support COUNT(*) pushdown on stable row id datasets#7360
wkalt wants to merge 2 commits into
lance-format:mainfrom
wkalt:ticket/gen-641/count-pushdown-stable-row-ids

wkalt commented Jun 18, 2026

Uh oh!

codecov Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wkalt commented Jun 18, 2026

What

Why it was off, and the fix

Benchmark

Tests

Uh oh!

codecov Bot commented Jun 18, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant