fix(table): validate scan filters before empty planning returns#1128
fix(table): validate scan filters before empty planning returns#1128officialasishkumar wants to merge 1 commit into
Conversation
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
tanmayrauth
left a comment
There was a problem hiding this comment.
Thanks for the fix.
Two test-coverage suggestions:
- Add a test for the
asOfTimestamp→ empty-manifest path. The ordering (asOfTimestampresolution beforevalidateRowFilter) isn't asserted anywhere, so a future refactor could silently regress it. - The new tests construct
Scan{}via struct literals, which bypasses defaults set bymetadata.Scan(...). One end-to-end test throughTable.Scan(...).PlanFiles(...)would lock in the public-API
contract.
LGTM on the fix itself.
laskoviymishka
left a comment
There was a problem hiding this comment.
One thing I'd want resolved before merge though: Java, PyIceberg, and iceberg-rust all return an empty iterable when there's no snapshot or no manifests, without binding the filter, so this PR makes iceberg-go the only client that hard-errors on an empty-table scan with an invalid filter. I'd either gate the validation on scan.Snapshot() != nil && len(manifestList) > 0, or make the divergence a conscious, documented choice. The validation primitive itself (newInclusiveMetricsEvaluator with include_empty_files) is also the wrong vehicle — iceberg.BindExpr directly would be clearer and avoid double-binding on the hot path. Left the rest inline.
Once that's settled, happy to take another pass.
| scan.snapshotID = &snapshot.SnapshotID | ||
| scan.asOfTimestamp = nil | ||
| } | ||
| if err := scan.validateRowFilter(); err != nil { |
There was a problem hiding this comment.
I'd think hard about this placement. Java's SnapshotScan.planFiles returns CloseableIterable.empty() when the snapshot is nil, PyIceberg's scan_plan_helper returns iter([]), and iceberg-rust does the same — none of them bind the filter on the empty-table path. After this PR we're the only client that hard-errors there.
The fix to the gap is real, but I'd either gate this on scan.Snapshot() != nil && len(manifestList) > 0 (so we only validate when we'd actually have used the filter), or make the divergence an explicit, documented choice rather than a side effect of validation placement. wdyt?
Given that we should definitely follow and match that behavior rather than decide to hard error |
Summary
Fixes #1119
Testing