Skip to content

Commit f309c80

Browse files
committed
Address to review comments
1 parent b0d4f55 commit f309c80

2 files changed

Lines changed: 36 additions & 30 deletions

File tree

  • docs
    • content.zh/docs/deployment/filesystems
    • content/docs/deployment/filesystems

docs/content.zh/docs/deployment/filesystems/s3.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -86,18 +86,9 @@ The recommended way of setting up credentials on AWS is via [Identity and Access
8686

8787
If you set this up correctly, you can manage access to S3 within AWS and don't need to distribute any access keys to Flink.
8888

89-
#### Access Keys (Discouraged)
89+
#### Delegation Tokens
9090

91-
Access to S3 can be granted via your **access and secret key pair**. Please note that this is discouraged since the [introduction of IAM roles](https://blogs.aws.amazon.com/security/post/Tx1XG3FX6VMU6O5/A-safer-way-to-distribute-AWS-credentials-to-EC2).
92-
93-
You need to configure both `s3.access-key` and `s3.secret-key` in Flink's [configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
94-
95-
```yaml
96-
s3.access-key: your-access-key
97-
s3.secret-key: your-secret-key
98-
```
99-
100-
You can limit this configuration to JobManagers by using [delegation tokens]({{< ref "docs/deployment/security/security-delegation-token" >}}):
91+
[Delegation tokens]({{< ref "docs/deployment/security/security-delegation-token" >}}) provide time-bounded, automatically negotiated credentials. They are more secure than static access keys since tokens are temporary and don't require distributing long-lived secrets. Configure the credentials provider for your S3 implementation:
10192

10293
```yaml
10394
# For Native S3 implementation
@@ -108,6 +99,17 @@ fs.s3a.aws.credentials.provider: org.apache.flink.fs.s3.common.token.DynamicTemp
10899
presto.s3.credentials-provider: org.apache.flink.fs.s3.common.token.DynamicTemporaryAWSCredentialsProvider
109100
```
110101
102+
#### Access Keys
103+
104+
Access to S3 can be granted via your **access and secret key pair**. While access keys are not inherently insecure, IAM roles are preferred as they avoid the need to manage and distribute static credentials. See the [introduction of IAM roles](https://blogs.aws.amazon.com/security/post/Tx1XG3FX6VMU6O5/A-safer-way-to-distribute-AWS-credentials-to-EC2) for more context.
105+
106+
You need to configure both `s3.access-key` and `s3.secret-key` in Flink's [configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
107+
108+
```yaml
109+
s3.access-key: your-access-key
110+
s3.secret-key: your-secret-key
111+
```
112+
111113
### Configure Non-S3 Endpoint
112114

113115
The S3 filesystems also support using S3 compliant object stores.
@@ -314,15 +316,15 @@ All three S3 implementations register as handlers for the *s3://* scheme. Additi
314316

315317
It is safe to load multiple S3 plugin JARs simultaneously — the priority mechanism ensures only one factory handles each scheme. The Native S3 implementation has the lowest priority (`-1` vs the default `0`), so when another implementation is present, it will take precedence for all overlapping schemes (e.g., *s3://* and *s3a://*). You can override factory priorities via the `fs.<scheme>.priority.<factoryClassName>` configuration option.
316318

317-
You can use multiple S3 implementations simultaneously by leveraging their different URI schemes. For example, if a job uses the [FileSystem]({{< ref "docs/connectors/datastream/filesystem" >}}) sink (Hadoop-only) but Presto for checkpointing:
319+
You can use multiple S3 implementations simultaneously by leveraging their different URI schemes. For example, if a job uses the [FileSystem]({{< ref "docs/connectors/datastream/filesystem" >}}) sink with Hadoop but Presto for checkpointing:
318320

319321
- Use *s3a://* scheme for the sink (Hadoop)
320322
- Use *s3p://* scheme for checkpointing (Presto)
321323

322324
{{< hint info >}}
323325
The Native S3 implementation does not introduce a new URI scheme. It supports the existing *s3://* and *s3a://* schemes. Since both the Native S3 and Hadoop implementations register for the same schemes, Flink uses a priority-based mechanism to select which factory handles each scheme. By default, Native S3 has the lowest priority and will **not** be selected when another implementation is present for the same scheme.
324326

325-
To use the Native S3 implementation, either place only the `flink-s3-fs-native` plugin JAR in the `plugins` directory, or use the `fs.<scheme>.priority.<factoryClassName>` configuration to raise its priority while keeping other implementations loaded.
327+
To use the Native S3 implementation, either place only the `flink-s3-fs-native` plugin JAR in the `plugins` directory, or use the `fs.<scheme>.priority.<factoryClassName>` configuration to raise its priority while other implementations are present in `plugins`.
326328
{{< /hint >}}
327329

328330
---
@@ -388,11 +390,12 @@ It is recommended to first configure and verify that Flink works without using `
388390

389391
#### Credentials
390392

391-
If you are using [access keys](#access-keys-discouraged), they will be passed to `s5cmd`.
393+
If you are using [access keys](#access-keys), they will be passed to `s5cmd`.
392394
Apart from that, `s5cmd` has its own independent way of [using credentials](https://github.com/peak/s5cmd?tab=readme-ov-file#specifying-credentials).
393395

394396
#### Limitations
395397

396-
Currently, If flink-s3-fs-hadoop / flink-s3-fs-presto uses `s5cmd` only during recovery, when downloading state files from S3 and using RocksDB.
398+
Currently, `flink-s3-fs-hadoop` and `flink-s3-fs-presto` use `s5cmd` only during recovery, when downloading state files from S3 and using RocksDB.
399+
`flink-s3-fs-native` uses `S3TransferManager` when enabled via `s3.bulk-copy.enabled` (default: `true`) for bulk copy operations and `s3.async.enabled` (default: `true`) for async read/write, providing similar performance benefits.
397400

398401
{{< top >}}

docs/content/docs/deployment/filesystems/s3.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -86,18 +86,9 @@ The recommended way of setting up credentials on AWS is via [Identity and Access
8686

8787
If you set this up correctly, you can manage access to S3 within AWS and don't need to distribute any access keys to Flink.
8888

89-
#### Access Keys (Discouraged)
89+
#### Delegation Tokens
9090

91-
Access to S3 can be granted via your **access and secret key pair**. Please note that this is discouraged since the [introduction of IAM roles](https://blogs.aws.amazon.com/security/post/Tx1XG3FX6VMU6O5/A-safer-way-to-distribute-AWS-credentials-to-EC2).
92-
93-
You need to configure both `s3.access-key` and `s3.secret-key` in Flink's [configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
94-
95-
```yaml
96-
s3.access-key: your-access-key
97-
s3.secret-key: your-secret-key
98-
```
99-
100-
You can limit this configuration to JobManagers by using [delegation tokens]({{< ref "docs/deployment/security/security-delegation-token" >}}):
91+
[Delegation tokens]({{< ref "docs/deployment/security/security-delegation-token" >}}) provide time-bounded, automatically negotiated credentials. They are more secure than static access keys since tokens are temporary and don't require distributing long-lived secrets. Configure the credentials provider for your S3 implementation:
10192

10293
```yaml
10394
# For Native S3 implementation
@@ -108,6 +99,17 @@ fs.s3a.aws.credentials.provider: org.apache.flink.fs.s3.common.token.DynamicTemp
10899
presto.s3.credentials-provider: org.apache.flink.fs.s3.common.token.DynamicTemporaryAWSCredentialsProvider
109100
```
110101
102+
#### Access Keys
103+
104+
Access to S3 can be granted via your **access and secret key pair**. While access keys are not inherently insecure, IAM roles are preferred as they avoid the need to manage and distribute static credentials. See the [introduction of IAM roles](https://blogs.aws.amazon.com/security/post/Tx1XG3FX6VMU6O5/A-safer-way-to-distribute-AWS-credentials-to-EC2) for more context.
105+
106+
You need to configure both `s3.access-key` and `s3.secret-key` in Flink's [configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
107+
108+
```yaml
109+
s3.access-key: your-access-key
110+
s3.secret-key: your-secret-key
111+
```
112+
111113
### Configure Non-S3 Endpoint
112114

113115
The S3 filesystems also support using S3 compliant object stores.
@@ -314,15 +316,15 @@ All three S3 implementations register as handlers for the *s3://* scheme. Additi
314316

315317
It is safe to load multiple S3 plugin JARs simultaneously — the priority mechanism ensures only one factory handles each scheme. The Native S3 implementation has the lowest priority (`-1` vs the default `0`), so when another implementation is present, it will take precedence for all overlapping schemes (e.g., *s3://* and *s3a://*). You can override factory priorities via the `fs.<scheme>.priority.<factoryClassName>` configuration option.
316318

317-
You can use multiple S3 implementations simultaneously by leveraging their different URI schemes. For example, if a job uses the [FileSystem]({{< ref "docs/connectors/datastream/filesystem" >}}) sink (Hadoop-only) but Presto for checkpointing:
319+
You can use multiple S3 implementations simultaneously by leveraging their different URI schemes. For example, if a job uses the [FileSystem]({{< ref "docs/connectors/datastream/filesystem" >}}) sink with Hadoop but Presto for checkpointing:
318320

319321
- Use *s3a://* scheme for the sink (Hadoop)
320322
- Use *s3p://* scheme for checkpointing (Presto)
321323

322324
{{< hint info >}}
323325
The Native S3 implementation does not introduce a new URI scheme. It supports the existing *s3://* and *s3a://* schemes. Since both the Native S3 and Hadoop implementations register for the same schemes, Flink uses a priority-based mechanism to select which factory handles each scheme. By default, Native S3 has the lowest priority and will **not** be selected when another implementation is present for the same scheme.
324326

325-
To use the Native S3 implementation, either place only the `flink-s3-fs-native` plugin JAR in the `plugins` directory, or use the `fs.<scheme>.priority.<factoryClassName>` configuration to raise its priority while keeping other implementations loaded.
327+
To use the Native S3 implementation, either place only the `flink-s3-fs-native` plugin JAR in the `plugins` directory, or use the `fs.<scheme>.priority.<factoryClassName>` configuration to raise its priority while other implementations are present in `plugins`.
326328
{{< /hint >}}
327329

328330
---
@@ -388,11 +390,12 @@ It is recommended to first configure and verify that Flink works without using `
388390

389391
#### Credentials
390392

391-
If you are using [access keys](#access-keys-discouraged), they will be passed to `s5cmd`.
393+
If you are using [access keys](#access-keys), they will be passed to `s5cmd`.
392394
Apart from that, `s5cmd` has its own independent way of [using credentials](https://github.com/peak/s5cmd?tab=readme-ov-file#specifying-credentials).
393395

394396
#### Limitations
395397

396-
Currently, If flink-s3-fs-hadoop / flink-s3-fs-presto uses `s5cmd` only during recovery, when downloading state files from S3 and using RocksDB.
398+
Currently, `flink-s3-fs-hadoop` and `flink-s3-fs-presto` use `s5cmd` only during recovery, when downloading state files from S3 and using RocksDB.
399+
`flink-s3-fs-native` uses `S3TransferManager` when enabled via `s3.bulk-copy.enabled` (default: `true`) for bulk copy operations and `s3.async.enabled` (default: `true`) for async read/write, providing similar performance benefits.
397400

398401
{{< top >}}

0 commit comments

Comments
 (0)