Skip to content

GH-3277: Support passing write configurations to footer optionally#3278

Closed
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/additional-info
Closed

GH-3277: Support passing write configurations to footer optionally#3278
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/additional-info

Conversation

@ArnavBalyan

Copy link
Copy Markdown
Member
  • Write configurations are often needed for debugging and posterity however application logs are lost in a few days.
  • This change adds an optional flag, which when enabled passes the write configurations to the file footer.
  • Default flag is false, and can be enabled by users to pass this additional metadata to their parquet files.

@ArnavBalyan

Copy link
Copy Markdown
Member Author

cc @shangxinli @wgtmac could you please take a look thanks!

@wgtmac

wgtmac commented Aug 24, 2025

Copy link
Copy Markdown
Member

IMHO, this is merely a customized logic which can be handled pretty well by specific applications. We don't want to take the maintenance overhead.

@ArnavBalyan

Copy link
Copy Markdown
Member Author

IMHO, this is merely a customized logic which can be handled pretty well by specific applications. We don't want to take the maintenance overhead.

Thanks for the review! This is behind a feature flag and upto the users to enable it, the logic is minimal and provides high degree of clarity and debuggability for end users/applications that don't have to re-write this logic throughout. Maybe we could keep it default off and let users enable on demand, wdyt? @wgtmac @shangxinli

@wgtmac

wgtmac commented Aug 24, 2025

Copy link
Copy Markdown
Member

Defaulting to off does not justify it to be a valid feature to the Parquet library. If users want fine-grained control of the subset of configs, do we want to support it? Or if users have built a custom record writer on top of the ParrquetFileWriter (just like what Iceberg did), how do we know it? How does the ParquetRewriter handle different conflicting configs when merging several parquet files? So to me this is a pure application logic which users can handle it well on their side. We don't want to pay for the complexity within the library.

@ArnavBalyan

Copy link
Copy Markdown
Member Author

Defaulting to off does not justify it to be a valid feature to the Parquet library. If users want fine-grained control of the subset of configs, do we want to support it? Or if users have built a custom record writer on top of the ParrquetFileWriter (just like what Iceberg did), how do we know it? How does the ParquetRewriter handle different conflicting configs when merging several parquet files? So to me this is a pure application logic which users can handle it well on their side. We don't want to pay for the complexity within the library.

Sure sounds good! Will close this PR, I think some of the above should be easy to solve, definitely requires more discussion 👍

@ArnavBalyan

Copy link
Copy Markdown
Member Author

Closing this PR as suggested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants