You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/content.zh/docs/deployment/filesystems/s3.md
+18-15Lines changed: 18 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,18 +86,9 @@ The recommended way of setting up credentials on AWS is via [Identity and Access
86
86
87
87
If you set this up correctly, you can manage access to S3 within AWS and don't need to distribute any access keys to Flink.
88
88
89
-
#### Access Keys (Discouraged)
89
+
#### Delegation Tokens
90
90
91
-
Access to S3 can be granted via your **access and secret key pair**. Please note that this is discouraged since the [introduction of IAM roles](https://blogs.aws.amazon.com/security/post/Tx1XG3FX6VMU6O5/A-safer-way-to-distribute-AWS-credentials-to-EC2).
92
-
93
-
You need to configure both `s3.access-key` and `s3.secret-key` in Flink's [configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
94
-
95
-
```yaml
96
-
s3.access-key: your-access-key
97
-
s3.secret-key: your-secret-key
98
-
```
99
-
100
-
You can limit this configuration to JobManagers by using [delegation tokens]({{< ref "docs/deployment/security/security-delegation-token" >}}):
91
+
[Delegation tokens]({{< ref "docs/deployment/security/security-delegation-token" >}}) provide time-bounded, automatically negotiated credentials. They are more secure than static access keys since tokens are temporary and don't require distributing long-lived secrets. Configure the credentials provider for your S3 implementation:
Access to S3 can be granted via your **access and secret key pair**. While access keys are not inherently insecure, IAM roles are preferred as they avoid the need to manage and distribute static credentials. See the [introduction of IAM roles](https://blogs.aws.amazon.com/security/post/Tx1XG3FX6VMU6O5/A-safer-way-to-distribute-AWS-credentials-to-EC2) for more context.
105
+
106
+
You need to configure both `s3.access-key` and `s3.secret-key` in Flink's [configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
107
+
108
+
```yaml
109
+
s3.access-key: your-access-key
110
+
s3.secret-key: your-secret-key
111
+
```
112
+
111
113
### Configure Non-S3 Endpoint
112
114
113
115
The S3 filesystems also support using S3 compliant object stores.
@@ -314,15 +316,15 @@ All three S3 implementations register as handlers for the *s3://* scheme. Additi
314
316
315
317
It is safe to load multiple S3 plugin JARs simultaneously — the priority mechanism ensures only one factory handles each scheme. The Native S3 implementation has the lowest priority (`-1` vs the default `0`), so when another implementation is present, it will take precedence for all overlapping schemes (e.g., *s3://* and *s3a://*). You can override factory priorities via the `fs.<scheme>.priority.<factoryClassName>` configuration option.
316
318
317
-
You can use multiple S3 implementations simultaneously by leveraging their different URI schemes. For example, if a job uses the [FileSystem]({{< ref "docs/connectors/datastream/filesystem" >}}) sink (Hadoop-only) but Presto for checkpointing:
319
+
You can use multiple S3 implementations simultaneously by leveraging their different URI schemes. For example, if a job uses the [FileSystem]({{< ref "docs/connectors/datastream/filesystem" >}}) sink with Hadoop but Presto for checkpointing:
318
320
319
321
- Use *s3a://* scheme for the sink (Hadoop)
320
322
- Use *s3p://* scheme for checkpointing (Presto)
321
323
322
324
{{< hint info >}}
323
325
The Native S3 implementation does not introduce a new URI scheme. It supports the existing *s3://* and *s3a://* schemes. Since both the Native S3 and Hadoop implementations register for the same schemes, Flink uses a priority-based mechanism to select which factory handles each scheme. By default, Native S3 has the lowest priority and will **not** be selected when another implementation is present for the same scheme.
324
326
325
-
To use the Native S3 implementation, either place only the `flink-s3-fs-native` plugin JAR in the `plugins` directory, or use the `fs.<scheme>.priority.<factoryClassName>` configuration to raise its priority while keeping other implementations loaded.
327
+
To use the Native S3 implementation, either place only the `flink-s3-fs-native` plugin JAR in the `plugins` directory, or use the `fs.<scheme>.priority.<factoryClassName>` configuration to raise its priority while other implementations are present in `plugins`.
326
328
{{< /hint >}}
327
329
328
330
---
@@ -388,11 +390,12 @@ It is recommended to first configure and verify that Flink works without using `
388
390
389
391
#### Credentials
390
392
391
-
If you are using [access keys](#access-keys-discouraged), they will be passed to `s5cmd`.
393
+
If you are using [access keys](#access-keys), they will be passed to `s5cmd`.
392
394
Apart from that, `s5cmd` has its own independent way of [using credentials](https://github.com/peak/s5cmd?tab=readme-ov-file#specifying-credentials).
393
395
394
396
#### Limitations
395
397
396
-
Currently, If flink-s3-fs-hadoop / flink-s3-fs-presto uses `s5cmd` only during recovery, when downloading state files from S3 and using RocksDB.
398
+
Currently, `flink-s3-fs-hadoop` and `flink-s3-fs-presto` use `s5cmd` only during recovery, when downloading state files from S3 and using RocksDB.
399
+
`flink-s3-fs-native` uses `S3TransferManager` when enabled via `s3.bulk-copy.enabled` (default: `true`) for bulk copy operations and `s3.async.enabled` (default: `true`) for async read/write, providing similar performance benefits.
Copy file name to clipboardExpand all lines: docs/content/docs/deployment/filesystems/s3.md
+18-15Lines changed: 18 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,18 +86,9 @@ The recommended way of setting up credentials on AWS is via [Identity and Access
86
86
87
87
If you set this up correctly, you can manage access to S3 within AWS and don't need to distribute any access keys to Flink.
88
88
89
-
#### Access Keys (Discouraged)
89
+
#### Delegation Tokens
90
90
91
-
Access to S3 can be granted via your **access and secret key pair**. Please note that this is discouraged since the [introduction of IAM roles](https://blogs.aws.amazon.com/security/post/Tx1XG3FX6VMU6O5/A-safer-way-to-distribute-AWS-credentials-to-EC2).
92
-
93
-
You need to configure both `s3.access-key` and `s3.secret-key` in Flink's [configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
94
-
95
-
```yaml
96
-
s3.access-key: your-access-key
97
-
s3.secret-key: your-secret-key
98
-
```
99
-
100
-
You can limit this configuration to JobManagers by using [delegation tokens]({{< ref "docs/deployment/security/security-delegation-token" >}}):
91
+
[Delegation tokens]({{< ref "docs/deployment/security/security-delegation-token" >}}) provide time-bounded, automatically negotiated credentials. They are more secure than static access keys since tokens are temporary and don't require distributing long-lived secrets. Configure the credentials provider for your S3 implementation:
Access to S3 can be granted via your **access and secret key pair**. While access keys are not inherently insecure, IAM roles are preferred as they avoid the need to manage and distribute static credentials. See the [introduction of IAM roles](https://blogs.aws.amazon.com/security/post/Tx1XG3FX6VMU6O5/A-safer-way-to-distribute-AWS-credentials-to-EC2) for more context.
105
+
106
+
You need to configure both `s3.access-key` and `s3.secret-key` in Flink's [configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):
107
+
108
+
```yaml
109
+
s3.access-key: your-access-key
110
+
s3.secret-key: your-secret-key
111
+
```
112
+
111
113
### Configure Non-S3 Endpoint
112
114
113
115
The S3 filesystems also support using S3 compliant object stores.
@@ -314,15 +316,15 @@ All three S3 implementations register as handlers for the *s3://* scheme. Additi
314
316
315
317
It is safe to load multiple S3 plugin JARs simultaneously — the priority mechanism ensures only one factory handles each scheme. The Native S3 implementation has the lowest priority (`-1` vs the default `0`), so when another implementation is present, it will take precedence for all overlapping schemes (e.g., *s3://* and *s3a://*). You can override factory priorities via the `fs.<scheme>.priority.<factoryClassName>` configuration option.
316
318
317
-
You can use multiple S3 implementations simultaneously by leveraging their different URI schemes. For example, if a job uses the [FileSystem]({{< ref "docs/connectors/datastream/filesystem" >}}) sink (Hadoop-only) but Presto for checkpointing:
319
+
You can use multiple S3 implementations simultaneously by leveraging their different URI schemes. For example, if a job uses the [FileSystem]({{< ref "docs/connectors/datastream/filesystem" >}}) sink with Hadoop but Presto for checkpointing:
318
320
319
321
- Use *s3a://* scheme for the sink (Hadoop)
320
322
- Use *s3p://* scheme for checkpointing (Presto)
321
323
322
324
{{< hint info >}}
323
325
The Native S3 implementation does not introduce a new URI scheme. It supports the existing *s3://* and *s3a://* schemes. Since both the Native S3 and Hadoop implementations register for the same schemes, Flink uses a priority-based mechanism to select which factory handles each scheme. By default, Native S3 has the lowest priority and will **not** be selected when another implementation is present for the same scheme.
324
326
325
-
To use the Native S3 implementation, either place only the `flink-s3-fs-native` plugin JAR in the `plugins` directory, or use the `fs.<scheme>.priority.<factoryClassName>` configuration to raise its priority while keeping other implementations loaded.
327
+
To use the Native S3 implementation, either place only the `flink-s3-fs-native` plugin JAR in the `plugins` directory, or use the `fs.<scheme>.priority.<factoryClassName>` configuration to raise its priority while other implementations are present in `plugins`.
326
328
{{< /hint >}}
327
329
328
330
---
@@ -388,11 +390,12 @@ It is recommended to first configure and verify that Flink works without using `
388
390
389
391
#### Credentials
390
392
391
-
If you are using [access keys](#access-keys-discouraged), they will be passed to `s5cmd`.
393
+
If you are using [access keys](#access-keys), they will be passed to `s5cmd`.
392
394
Apart from that, `s5cmd` has its own independent way of [using credentials](https://github.com/peak/s5cmd?tab=readme-ov-file#specifying-credentials).
393
395
394
396
#### Limitations
395
397
396
-
Currently, If flink-s3-fs-hadoop / flink-s3-fs-presto uses `s5cmd` only during recovery, when downloading state files from S3 and using RocksDB.
398
+
Currently, `flink-s3-fs-hadoop` and `flink-s3-fs-presto` use `s5cmd` only during recovery, when downloading state files from S3 and using RocksDB.
399
+
`flink-s3-fs-native` uses `S3TransferManager` when enabled via `s3.bulk-copy.enabled` (default: `true`) for bulk copy operations and `s3.async.enabled` (default: `true`) for async read/write, providing similar performance benefits.
0 commit comments