Core: Parquet per column compression#16094
Conversation
There was a problem hiding this comment.
Lets wait for parquet code changes to complete and release then, once we get there, we may resume the discussion ?
cc @emkornfield
|
Sounds good, thanks @singhpk234 |
|
I don’t think we should use column name in table property. Column name is not unique. |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
I follow the pattern of current table property. https://github.com/Gerrrr/iceberg/blob/main/docs/docs/configuration.md. For example, bloom filter config is also based on column name. |
Closes #16090
Previously all columns in a Parquet file were forced to use the same codec. This PR enables parquet per column compression based on apache/parquet-java#3526 and apache/parquet-java#3396.
Changes
write.parquet.compression-level.column.) — columns without an override fall back to the global codec.
Test with a spark Job
Result