Core: Handle nil logical partition values in manifests#1179
Conversation
tanmayrauth
left a comment
There was a problem hiding this comment.
The guard is right and clearly stops the panic, thanks for tracking it down. A few concrete things before it merges:
Now that Partition() returns nil instead of panicking, that nil flows into eq-delete conflict validation. In eqDeletePartitionsToFilter (conflict_validation.go:545) an identity-partitioned eq-delete calls anyToLiteral(p[partFieldID]), and there's no nil guard before it — anyToLiteral(nil) falls to the default at conflict_validation.go:603 and returns "unsupported partition value type <nil>", which fails the commit. So for a null identity-partition value this trades a panic for a rejected commit. Can you handle nil explicitly there (emit an IS NULL term) and add a test that drives that path?
On the test: it hand-builds the dataFile struct rather than decoding avro, so it exercises the conversion's nil handling but not the avro decode path that actually produced the nil — which is what the bug was. Could you write and read back a real OCF with a nullable union partition field (["null", {"type":"int","logicalType":"date"}]) so the test locks down the actual decode behavior?
Also, the ValueCounts() assertion goes through initColumnStatsData(), not the partition path, so it passes with or without the fix — Partition()[1000] is the only line doing real work in that test. Worth dropping the ValueCounts() lines or reframing them as an explicit "stays empty, doesn't panic" check.
Thanks that makes sense. I updated the patch to handle nil identity partition values in I also replaced the hand-build dataFile regression with a real manifest OCF round-trip using a nullable logical date partition field, and removed the Added coverage for the equality-delete conflict validation path with a null identity partition as well. |
| } | ||
|
|
||
| func (d *dataFile) convertAvroValueToIcebergType(v any, fieldID int) any { | ||
| if v == nil { |
There was a problem hiding this comment.
Returning nil here is correct, but it means a null partition value now reaches eq-delete conflict validation instead of panicking. In eqDeletePartitionsToFilter (conflict_validation.go:545) an identity-partitioned eq-delete calls anyToLiteral(p[partFieldID]) with no nil guard, and anyToLiteral(nil) falls to the default at conflict_validation.go:603 and returns "unsupported partition value type <nil>" — which fails the commit. So this trades the panic for a rejected commit on null identity-partition values. Can we handle nil explicitly in that path (emit an IS NULL term) and add a test that drives it?
Summary
Fixes a panic when reading manifest partition data where avro returns a nil value for a logical partition field.
Previously,
convertAvroValueToIcebergTypecould reach theatype.Datebranch with a nil value and panic with:interface conversion: interface {} is nil, not in32
This change return nil immediately when the Avro partition value is nil.
Testing
Added a regresion test that builds a data file with a nil
dtpartition value mapped to field ID 1000 using Avro logical typedatethen callsValueCounts()to exercise the same inilization path.Verified with:
go test -run TestDataFileInitializeMapDataHandlesNilLogicalPartitionValue .
go test ./...
git diff --check