Handle BucketAlreadyOwnedByYou race condition in S3 bucket initialization#7325
Handle BucketAlreadyOwnedByYou race condition in S3 bucket initialization#7325hanwen-cluster merged 3 commits intoaws:developfrom
Conversation
7e526f3 to
7890cbb
Compare
gmarciani
left a comment
There was a problem hiding this comment.
[Docs] This bug fix deserves an entry in the changelog because it prevents cluster creation failures.
| LOGGER.error("Unable to create S3 bucket %s.", bucket.name) | ||
| raise error | ||
| if error.error_code == "BucketAlreadyOwnedByYou": | ||
| LOGGER.info("S3 bucket %s was concurrently created by another process. Proceeding.", bucket.name) |
There was a problem hiding this comment.
[minor] What about logging instead "Bucket ${BUCKET_NAME} is already owned by you". The fact the the bucket was created by a concurrent process is a low level detail we observed in our integ test, but this is a customer facing log line
There was a problem hiding this comment.
This detail is needed because the error wouldn't appear if the bucket existed long before cluster creation. This error is only triggered if the race condition happens
There was a problem hiding this comment.
This is message is part of the product not just the Integ test Infra so the messaging should be aligned with what the CX would see, which may or may not be multi-threaded cluster creation process.
7890cbb to
d298487
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7325 +/- ##
===========================================
- Coverage 90.08% 90.07% -0.01%
===========================================
Files 182 182
Lines 16730 16732 +2
===========================================
+ Hits 15071 15072 +1
- Misses 1659 1660 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| ------ | ||
|
|
||
| **BUG FIXES** | ||
| - Fix sporadic S3 bucket creation failure when multiple create-cluster commands are running simultaneously in the same region. |
There was a problem hiding this comment.
Do we have a name for the specific bucket like default bucket? or do-not-delete bucket? IF so I suggest we specify that in the changelog.
…tion The default S3 bucket name is deterministic per account+region, so concurrent cluster creation calls (e.g. parallel integration tests) can race: multiple callers get a 404 from head_bucket, then all attempt create_bucket. The first succeeds; the rest fail with BucketAlreadyOwnedByYou, which was previously unhandled and surfaced as "Unable to initialize s3 bucket." Treat BucketAlreadyOwnedByYou as a success condition since it confirms the bucket exists and is owned by the caller, which is the desired end state. Log at info level and proceed normally.
d298487 to
3836747
Compare
Description of changes
The default S3 bucket name is deterministic per account+region, so concurrent cluster creation calls (e.g. parallel integration tests) can race: multiple callers get a 404 from head_bucket, then all attempt create_bucket. The first succeeds; the rest fail with BucketAlreadyOwnedByYou, which was previously unhandled and surfaced as "Unable to initialize s3 bucket."
Treat BucketAlreadyOwnedByYou as a success condition since it confirms the bucket exists and is owned by the caller, which is the desired end state. Log at info level and proceed normally.
Tests
References
Checklist
developadd the branch name as prefix in the PR title (e.g.[release-3.6]).Please review the guidelines for contributing and Pull Request Instructions.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.