Skip to content

[GH-583]: Introduce chronological ordering for INT96 timestamps#584

Open
divjotarora wants to merge 3 commits into
apache:masterfrom
divjotarora:int96-sort-order
Open

[GH-583]: Introduce chronological ordering for INT96 timestamps#584
divjotarora wants to merge 3 commits into
apache:masterfrom
divjotarora:int96-sort-order

Conversation

@divjotarora

Copy link
Copy Markdown

Rationale for this change

When writing INT96 timestamp columns, writers either omit stats altogether or emit stats using the TYPE_ORDER ordering. However, some writers were incorrectly emitting stats via bytewise comparisons, which does not result in chronological INT96 ordering. These stats are incorrect and must be ignored by readers. The goal of this change is to introduce a mechanism for readers to correctly determine the validity of an INT96 column's statistics and ignore them if they are potentially incorrect.

What changes are included in this PR?

This PR specifies a new INT96_TIMESTAMP_ORDER sort order specifically used for INT96 timestamp statistics. Additionally, it suggests writers use this ordering when emitting INT96 stats and that readers ignore stats for INT96 TYPE_ORDER'd columns.

Do these changes have PoC implementations?

In-progress

Closes #$583

Comment thread src/main/thrift/parquet.thrift Outdated
@wgtmac

wgtmac commented Jun 10, 2026

Copy link
Copy Markdown
Member

I have some general questions with regard to this proposal. A new column order requires change on the writer side. If we need to change the writer code anyway, isn't it simpler to just let writers emit INT64 values for timestamp? If it is targeted to not break the old readers that currently consume INT96 values and unable to upgrade, we have to make sure that they do not fail because of the unknown column order.

@emkornfield emkornfield left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording looks fine to me. I agree with @etseidl on the clarification. And also we should try to figure out if there is a way to address @wgtmac concerns.

@etseidl

etseidl commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

I don't think the intent here is to support old readers. Instead, this is to aid new readers that want to be able to trust INT96 statistics generated by new writers.

We've already established that most old readers should be able to tolerate an unknown column order, with arrow-rs pre-57.0 being a notable exception.

@wgtmac

wgtmac commented Jun 10, 2026

Copy link
Copy Markdown
Member

this is to aid new readers that want to be able to trust INT96 statistics generated by new writers

What is the point of this? If both readers and writers are new, why not using INT64-typed timestamps instead? INT96 are marked as deprecated for years.

@divjotarora

Copy link
Copy Markdown
Author

What is the point of this? If both readers and writers are new, why not using INT64-typed timestamps instead? INT96 are marked as deprecated for years.

@wgtmac This is a fair point, but some large engines like Spark still default to writing INT96 timestamps and not having a way to signal that stats are definitively correct is useful there. I'm also following up with Spark folks to see how we can make progress and move towards INT64 timestamps by default, but in the short term I still feel this change is targeted and useful.

If it is targeted to not break the old readers that currently consume INT96 values and unable to upgrade, we have to make sure that they do not fail because of the unknown column order.

This change is no more dangerous / breaking for old readers than the recent change to add the IEEE floating point order. IIRC there was a message in the mailing list about some older versions of arrow-rs failing on unknown sort orders, but I believe that's been fixed as well.

@divjotarora divjotarora requested a review from etseidl June 10, 2026 09:39

@etseidl etseidl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changes look good 👍

BTW I have a rust PoC nearly ready to go.

@etseidl

etseidl commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Still needs to add tests, but Rust PoC is up apache/arrow-rs#10106

@emkornfield

Copy link
Copy Markdown
Contributor

What is the point of this? If both readers and writers are new, why not using INT64-typed timestamps instead? INT96 are marked as deprecated for years.

@wgtmac This is a fair point, but some large engines like Spark still default to writing INT96 timestamps and not having a way to signal that stats are definitively correct is useful there. I'm also following up with Spark folks to see how we can make progress and move towards INT64 timestamps by default, but in the short term I still feel this change is targeted and useful.

There is also no current alternative for nano timestamps that need to span the SQL time RAnge (years 0001-9999)

@CurtHagenlocher

Copy link
Copy Markdown

There is also no current alternative for nano timestamps that need to span the SQL time RAnge (years 0001-9999)

As someone not really in the Spark world, doesn't the persistence of this type imply there's a need that Parquet isn't filling without it? Is it worth reconsidering its deprecation? I agree with @wgtmac that it's a bit weird to say "don't use this type! but if you do, then change your writers to emit this new sort order."

@etseidl

etseidl commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@divjotarora

Copy link
Copy Markdown
Author

parquet-java reference implementation: apache/parquet-java#3610

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants