Skip to content

feat(s3): Add support for hive.s3.min-part-size when writing#16935

Closed
czentgr wants to merge 1 commit intofacebookincubator:mainfrom
czentgr:cz_fix_s3_part_sizes
Closed

feat(s3): Add support for hive.s3.min-part-size when writing#16935
czentgr wants to merge 1 commit intofacebookincubator:mainfrom
czentgr:cz_fix_s3_part_sizes

Conversation

@czentgr
Copy link
Copy Markdown
Collaborator

@czentgr czentgr commented Mar 26, 2026

This PR introduces the hive.s3.min-part-size option available in Presto Java. It defines the minimum size of a buffer before multipart uploads are used. As a result files less than the min-part-size are sent in a single PUT request. Previously, even smaller files were sent through the multi-part upload. However, AWS limits the minimum size of a part to 5MB.
As a result, some S3 backends such as Apache Ozone ignored uploads of parts with less than 5MB leading to errors when finishing the multi-part upload.

The default part size is 10MB to match the existing hard coded value. Every part is of size min-part-size to conform to other S3 backends where each part size must be exactly the same size except the last part. As a result min-part-size is also the effective part size. Therefore, this change may affect performance with the default configuration.

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 26, 2026

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 2917136
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/69d01c0d2219b30008a7f130

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2026
@czentgr czentgr marked this pull request as ready for review March 26, 2026 18:30
@czentgr czentgr requested a review from majetideepak as a code owner March 26, 2026 18:30
@czentgr czentgr force-pushed the cz_fix_s3_part_sizes branch from 2231e30 to f3dcfb6 Compare March 26, 2026 18:40
@majetideepak majetideepak changed the title feat(s3): Port hive.s3.min-part-size and enforce minimum part size feat(s3): Add support for hive.s3.min-part-size when writing Apr 2, 2026
Copy link
Copy Markdown
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments.

protected:
class Impl;
std::shared_ptr<Impl> impl_;
std::shared_ptr<S3Config> config_;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base class has a variable with the same name config_. Let's name this s3Config_

Comment thread velox/connectors/hive/storage_adapters/s3fs/S3Config.cpp
@czentgr czentgr force-pushed the cz_fix_s3_part_sizes branch from f3dcfb6 to c5a4f98 Compare April 3, 2026 19:12
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 3, 2026

Build Impact Analysis

Selective Build Targets (building these covers all 13 affected)

cmake --build _build/release --target velox_query_replayer velox_read_benchmark velox_s3config_test velox_s3file_test velox_s3finalize_test velox_s3insert_test velox_s3metrics_test velox_s3multiendpoints_test velox_s3read_test velox_s3registration_test velox_tool_trace_test

Total affected: 13/557 targets

Warning: 1 file(s) could not be mapped to any target. A full build may be needed.

  • velox/docs/configs.rst
Affected targets (13)

Directly changed (9)

Target Changed Files
velox_s3config_test S3Config.h, S3ConfigTest.cpp
velox_s3file_test S3FileSystem.h, S3WriteFile.h
velox_s3finalize_test S3FileSystem.h
velox_s3fs S3Config.cpp, S3Config.h, S3FileSystem.cpp, S3FileSystem.h, S3WriteFile.cpp, ... (+1 more)
velox_s3insert_test S3FileSystem.h, S3InsertTest.cpp
velox_s3metrics_test S3FileSystem.h, S3WriteFile.h
velox_s3multiendpoints_test S3FileSystem.h
velox_s3read_test S3FileSystem.h
velox_s3registration_test S3FileSystem.h

Transitively affected (4)

  • velox_query_replayer
  • velox_query_trace_replayer_base
  • velox_read_benchmark
  • velox_tool_trace_test

Fast path • Graph from main@95ce7612570fe718a42403cd294fe6900fc04e82

This PR introduces the hive.s3.min-part-size option available in Presto Java.
It defines the minimum size of a buffer before multipart uploads are used.
As a result files less than the min-part-size are sent in a single PUT request.
Previously, even smaller files were sent through the multi-part upload. However,
AWS limits the minimum size of a part to 5MB.
As a result, some S3 backends such as Apache Ozone ignored uploads of parts
with less than 5MB leading to errors when finishing the multi-part upload.

The default part size is 10MB to match the existing hard coded value.
Every part is of size min-part-size to conform to other S3 backends where
each part size must be exactly the same size except the last part.
As a result min-part-size is also the effective part size.
Therefore, this change may affect performance with the default
configuration.
@czentgr czentgr force-pushed the cz_fix_s3_part_sizes branch from c5a4f98 to 2917136 Compare April 3, 2026 19:59
@majetideepak majetideepak added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Apr 3, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Apr 3, 2026

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this in D99502533.

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Apr 6, 2026

@bikramSingh91 merged this pull request in 95894c3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants