This issue is ai assisted, and I have verified all the details.
devlist thread: https://lists.apache.org/thread/qqw5oog5swmswxqqmp693vz1rw132xb6
Problem
The day partition transform's result type is documented as int, but the majority of implementations (Java, PyIceberg, Rust) write manifest partition fields with Avro logical type date. This divergence has already caused real interoperability failures between implementations.
Both encodings have the same physical representation (a 4-byte integer counting days from 1970-01-01). The difference is only the Avro schema annotation — but readers that validate schemas strictly will reject manifests written with the "wrong" annotation.
Spec-literal form (what the spec text implies):
De facto convention (what most implementations write):
{ "type": "int", "logicalType": "date" }
Current ecosystem state
| Implementation |
Writes |
Reads plain int? |
Reads logical date? |
Java (apache/iceberg) |
logical date |
✅ |
✅ |
PyIceberg (apache/iceberg-python) |
logical date |
✅ |
✅ |
Rust (apache/iceberg-rust) |
logical date |
✅ (fixed in PR #496) |
✅ |
Go (apache/iceberg-go) |
plain int |
✅ |
⚠️ (open PR #915 to fix) |
Result: Before Rust's fix (PR #496), Go-written manifests failed to be read by Rust. Go still diverges from the dominant writer convention, and its tolerance for logical date manifests is unconfirmed.
Why the spec is ambiguous
Two spec sections give conflicting guidance:
1. Partition transform result type table
| Transform |
Result type |
day |
int |
Source: Partition Transforms
2. Avro type mapping (Appendix A)
| Iceberg type |
Avro type |
int |
int |
date |
{ "type": "int", "logicalType": "date" } |
Source: Appendix A
Literal reading: day(...) → result type int → Avro int (no logical type annotation).
Actual behavior: Java/PyIceberg/Rust treat day(...) result as DateType internally, which maps to Avro logical date. This makes the manifest partition field more human-readable but diverges from the literal spec text.
Proposed resolution
I'd suggest Option C — keep int as the documented result type but explicitly address the manifest encoding:
For partition fields produced by the day transform, the stored value is an integer number of days from 1970-01-01. In Avro manifests, writers SHOULD use { "type": "int", "logicalType": "date" }. Readers MUST accept both plain int and int annotated with logicalType: date.
This:
- Matches existing Java/PyIceberg/Rust writer behavior (no ecosystem disruption)
- Requires readers to be tolerant (fixes the Go→Rust interop gap)
- Gives Go a clear target to align with
- Doesn't require changing the transform result type table
Alternative options
Option A: Enforce spec-literal int
Mandate plain Avro int. Would require Java/PyIceberg/Rust to change their writers and creates backward compatibility issues with existing manifests.
Option B: Change the transform result type to date
Update the partition transform table so day(...) returns date instead of int. Cleanest long-term fix but changes the spec's type model. May have implications for other spec-derived logic (e.g., expression evaluation, projection).
Questions for the community
- Should the canonical writer form be plain
int or logical date?
- Should readers be required (
MUST) or recommended (SHOULD) to accept both?
- If the answer is logical
date, should the transform result type table be updated to say date?
Related issues and PRs
Appendix: Detailed implementation analysis
Java (apache/iceberg)
Writer behavior: Models day(...) as DateType via Dates.getResultType() and Timestamps.getResultType(). The Avro conversion in TypeToSchema writes DateType as logical date.
Reader behavior: InternalReader.primitive() maps both Avro logical date and plain INT to ValueReaders.ints(). Fully tolerant of both forms.
PyIceberg (apache/iceberg-python)
Writer behavior: DayTransform.result_type returns DateType(). Avro conversion in schema_conversion.py maps DateType to logical date.
Reader behavior: DateReader subclasses IntegerReader, so both forms decode identically. Minor caveat: explicit int→date projection compatibility is less clear than Java's tolerant reader.
Rust (apache/iceberg-rust)
Writer behavior: Transform::Day.result_type() maps to PrimitiveType::Date. Avro schema conversion maps Date to logical date.
Reader behavior: The manifest reader reconstructs the expected schema from table metadata. Originally, a mismatch between expected and file schema caused #478. This was fixed in two steps: PR #479 changed day result type to Date, and PR #496 added Datum Date↔Int type conversion so plain int manifests are also accepted.
History: Originally used Int, changed to Date in PR #479, then PR #496 ensured both forms are readable.
Go (apache/iceberg-go)
Writer behavior: DayTransform.ResultType() returns PrimitiveTypes.Int32. Avro conversion maps Int32Type to plain Avro int.
Reader behavior: Reads its own plain int manifests. Tolerance for logical date manifests is not confirmed on merged main.
Open PR: #915 proposes switching to logical date for interoperability.
This issue is ai assisted, and I have verified all the details.
devlist thread: https://lists.apache.org/thread/qqw5oog5swmswxqqmp693vz1rw132xb6
Problem
The
daypartition transform's result type is documented asint, but the majority of implementations (Java, PyIceberg, Rust) write manifest partition fields with Avro logical typedate. This divergence has already caused real interoperability failures between implementations.Both encodings have the same physical representation (a 4-byte integer counting days from 1970-01-01). The difference is only the Avro schema annotation — but readers that validate schemas strictly will reject manifests written with the "wrong" annotation.
Spec-literal form (what the spec text implies):
De facto convention (what most implementations write):
{ "type": "int", "logicalType": "date" }Current ecosystem state
int?date?apache/iceberg)dateapache/iceberg-python)dateapache/iceberg-rust)dateapache/iceberg-go)intResult: Before Rust's fix (PR #496), Go-written manifests failed to be read by Rust. Go still diverges from the dominant writer convention, and its tolerance for logical
datemanifests is unconfirmed.Why the spec is ambiguous
Two spec sections give conflicting guidance:
1. Partition transform result type table
Source: Partition Transforms
2. Avro type mapping (Appendix A)
Source: Appendix A
Literal reading:
day(...)→ result typeint→ Avroint(no logical type annotation).Actual behavior: Java/PyIceberg/Rust treat
day(...)result asDateTypeinternally, which maps to Avro logicaldate. This makes the manifest partition field more human-readable but diverges from the literal spec text.Proposed resolution
I'd suggest Option C — keep
intas the documented result type but explicitly address the manifest encoding:This:
Alternative options
Option A: Enforce spec-literal
intMandate plain Avro
int. Would require Java/PyIceberg/Rust to change their writers and creates backward compatibility issues with existing manifests.Option B: Change the transform result type to
dateUpdate the partition transform table so
day(...)returnsdateinstead ofint. Cleanest long-term fix but changes the spec's type model. May have implications for other spec-derived logic (e.g., expression evaluation, projection).Questions for the community
intor logicaldate?MUST) or recommended (SHOULD) to accept both?date, should the transform result type table be updated to saydate?Related issues and PRs
daytoDateint)dateAppendix: Detailed implementation analysis
Java (
apache/iceberg)Writer behavior: Models
day(...)asDateTypeviaDates.getResultType()andTimestamps.getResultType(). The Avro conversion inTypeToSchemawritesDateTypeas logicaldate.Reader behavior:
InternalReader.primitive()maps both Avro logicaldateand plainINTtoValueReaders.ints(). Fully tolerant of both forms.PyIceberg (
apache/iceberg-python)Writer behavior:
DayTransform.result_typereturnsDateType(). Avro conversion inschema_conversion.pymapsDateTypeto logicaldate.Reader behavior:
DateReadersubclassesIntegerReader, so both forms decode identically. Minor caveat: explicit int→date projection compatibility is less clear than Java's tolerant reader.Rust (
apache/iceberg-rust)Writer behavior:
Transform::Day.result_type()maps toPrimitiveType::Date. Avro schema conversion mapsDateto logicaldate.Reader behavior: The manifest reader reconstructs the expected schema from table metadata. Originally, a mismatch between expected and file schema caused #478. This was fixed in two steps: PR #479 changed
dayresult type toDate, and PR #496 addedDatumDate↔Int type conversion so plainintmanifests are also accepted.History: Originally used
Int, changed toDatein PR #479, then PR #496 ensured both forms are readable.Go (
apache/iceberg-go)Writer behavior:
DayTransform.ResultType()returnsPrimitiveTypes.Int32. Avro conversion mapsInt32Typeto plain Avroint.Reader behavior: Reads its own plain
intmanifests. Tolerance for logicaldatemanifests is not confirmed on mergedmain.Open PR: #915 proposes switching to logical
datefor interoperability.