Skip to content

Spec ambiguity: Avro schema for day partition transform fields in manifests #16414

@kevinjqliu

Description

@kevinjqliu

This issue is ai assisted, and I have verified all the details.

devlist thread: https://lists.apache.org/thread/qqw5oog5swmswxqqmp693vz1rw132xb6

Problem

The day partition transform's result type is documented as int, but the majority of implementations (Java, PyIceberg, Rust) write manifest partition fields with Avro logical type date. This divergence has already caused real interoperability failures between implementations.

Both encodings have the same physical representation (a 4-byte integer counting days from 1970-01-01). The difference is only the Avro schema annotation — but readers that validate schemas strictly will reject manifests written with the "wrong" annotation.

Spec-literal form (what the spec text implies):

"type": "int"

De facto convention (what most implementations write):

{ "type": "int", "logicalType": "date" }

Current ecosystem state

Implementation Writes Reads plain int? Reads logical date?
Java (apache/iceberg) logical date
PyIceberg (apache/iceberg-python) logical date
Rust (apache/iceberg-rust) logical date ✅ (fixed in PR #496)
Go (apache/iceberg-go) plain int ⚠️ (open PR #915 to fix)

Result: Before Rust's fix (PR #496), Go-written manifests failed to be read by Rust. Go still diverges from the dominant writer convention, and its tolerance for logical date manifests is unconfirmed.


Why the spec is ambiguous

Two spec sections give conflicting guidance:

1. Partition transform result type table

Transform Result type
day int

Source: Partition Transforms

2. Avro type mapping (Appendix A)

Iceberg type Avro type
int int
date { "type": "int", "logicalType": "date" }

Source: Appendix A

Literal reading: day(...) → result type int → Avro int (no logical type annotation).

Actual behavior: Java/PyIceberg/Rust treat day(...) result as DateType internally, which maps to Avro logical date. This makes the manifest partition field more human-readable but diverges from the literal spec text.


Proposed resolution

I'd suggest Option C — keep int as the documented result type but explicitly address the manifest encoding:

For partition fields produced by the day transform, the stored value is an integer number of days from 1970-01-01. In Avro manifests, writers SHOULD use { "type": "int", "logicalType": "date" }. Readers MUST accept both plain int and int annotated with logicalType: date.

This:

  • Matches existing Java/PyIceberg/Rust writer behavior (no ecosystem disruption)
  • Requires readers to be tolerant (fixes the Go→Rust interop gap)
  • Gives Go a clear target to align with
  • Doesn't require changing the transform result type table

Alternative options

Option A: Enforce spec-literal int

Mandate plain Avro int. Would require Java/PyIceberg/Rust to change their writers and creates backward compatibility issues with existing manifests.

Option B: Change the transform result type to date

Update the partition transform table so day(...) returns date instead of int. Cleanest long-term fix but changes the spec's type model. May have implications for other spec-derived logic (e.g., expression evaluation, projection).


Questions for the community

  1. Should the canonical writer form be plain int or logical date?
  2. Should readers be required (MUST) or recommended (SHOULD) to accept both?
  3. If the answer is logical date, should the transform result type table be updated to say date?

Related issues and PRs


Appendix: Detailed implementation analysis

Java (apache/iceberg)

Writer behavior: Models day(...) as DateType via Dates.getResultType() and Timestamps.getResultType(). The Avro conversion in TypeToSchema writes DateType as logical date.

Reader behavior: InternalReader.primitive() maps both Avro logical date and plain INT to ValueReaders.ints(). Fully tolerant of both forms.


PyIceberg (apache/iceberg-python)

Writer behavior: DayTransform.result_type returns DateType(). Avro conversion in schema_conversion.py maps DateType to logical date.

Reader behavior: DateReader subclasses IntegerReader, so both forms decode identically. Minor caveat: explicit int→date projection compatibility is less clear than Java's tolerant reader.


Rust (apache/iceberg-rust)

Writer behavior: Transform::Day.result_type() maps to PrimitiveType::Date. Avro schema conversion maps Date to logical date.

Reader behavior: The manifest reader reconstructs the expected schema from table metadata. Originally, a mismatch between expected and file schema caused #478. This was fixed in two steps: PR #479 changed day result type to Date, and PR #496 added Datum Date↔Int type conversion so plain int manifests are also accepted.

History: Originally used Int, changed to Date in PR #479, then PR #496 ensured both forms are readable.


Go (apache/iceberg-go)

Writer behavior: DayTransform.ResultType() returns PrimitiveTypes.Int32. Avro conversion maps Int32Type to plain Avro int.

Reader behavior: Reads its own plain int manifests. Tolerance for logical date manifests is not confirmed on merged main.

Open PR: #915 proposes switching to logical date for interoperability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions