Skip to content

fix(table): validate scan filters before empty planning returns#1128

Open
officialasishkumar wants to merge 1 commit into
apache:mainfrom
officialasishkumar:fix/1119-planfiles-row-filter-validation
Open

fix(table): validate scan filters before empty planning returns#1128
officialasishkumar wants to merge 1 commit into
apache:mainfrom
officialasishkumar:fix/1119-planfiles-row-filter-validation

Conversation

@officialasishkumar

Copy link
Copy Markdown

Summary

  • validate scan row filters before planning can return early for missing snapshots or empty manifest lists
  • add coverage for invalid filters on both empty planning paths

Fixes #1119

Testing

  • go test ./table -run "TestPlanFilesValidatesRowFilter|TestBuild.*Evaluator" -count=1
  • go test ./table -count=1

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>

@tanmayrauth tanmayrauth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

Two test-coverage suggestions:

  • Add a test for the asOfTimestamp → empty-manifest path. The ordering (asOfTimestamp resolution before validateRowFilter) isn't asserted anywhere, so a future refactor could silently regress it.
  • The new tests construct Scan{} via struct literals, which bypasses defaults set by metadata.Scan(...). One end-to-end test through Table.Scan(...).PlanFiles(...) would lock in the public-API
    contract.

LGTM on the fix itself.

@laskoviymishka laskoviymishka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'd want resolved before merge though: Java, PyIceberg, and iceberg-rust all return an empty iterable when there's no snapshot or no manifests, without binding the filter, so this PR makes iceberg-go the only client that hard-errors on an empty-table scan with an invalid filter. I'd either gate the validation on scan.Snapshot() != nil && len(manifestList) > 0, or make the divergence a conscious, documented choice. The validation primitive itself (newInclusiveMetricsEvaluator with include_empty_files) is also the wrong vehicle — iceberg.BindExpr directly would be clearer and avoid double-binding on the hot path. Left the rest inline.

Once that's settled, happy to take another pass.

Comment thread table/scanner.go
scan.snapshotID = &snapshot.SnapshotID
scan.asOfTimestamp = nil
}
if err := scan.validateRowFilter(); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think hard about this placement. Java's SnapshotScan.planFiles returns CloseableIterable.empty() when the snapshot is nil, PyIceberg's scan_plan_helper returns iter([]), and iceberg-rust does the same — none of them bind the filter on the empty-table path. After this PR we're the only client that hard-errors there.

The fix to the gap is real, but I'd either gate this on scan.Snapshot() != nil && len(manifestList) > 0 (so we only validate when we'd actually have used the filter), or make the divergence an explicit, documented choice rather than a side effect of validation placement. wdyt?

@zeroshade

Copy link
Copy Markdown
Member

Java, PyIceberg, and iceberg-rust all return an empty iterable when there's no snapshot or no manifests, without binding the filter, so this PR makes iceberg-go the only client that hard-errors on an empty-table scan with an invalid filter.

Given that we should definitely follow and match that behavior rather than decide to hard error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

table: PlanFiles skips rowFilter validation on empty planning paths

4 participants