fix: keep polling service-linked-role delete on read-after-write lag by james00012 · Pull Request #1149 · gruntwork-io/cloud-nuke

james00012 · 2026-06-13T23:10:55Z

Follow-up to #1147 (merged). Based on master.

Summary

Addresses review feedback on #1147 and one correctness gap found while reviewing that change.

Intent

#1147 fixed false failures when a service-linked role deletes faster than the first status poll, by verifying the role via GetRole on a NoSuchEntity from GetServiceLinkedRoleDeletionStatus. Reviewing it surfaced a residual: the "task gone but role still present" branch returned a hard error, and PollUntil treats any returned error as terminal. IAM is eventually consistent, so a GetRole issued right after a fast delete can still see the role for a moment, which would turn read-after-write lag into the same false failure #1147 set out to remove.

Decision

Treat "task record gone, role still visible" as propagation lag and keep polling, the same way an in-progress deletion is handled (return false, nil). AWS has already accepted the deletion when the task record is purged, so a persistent "still present" state is implausible, and the existing 5 minute poll timeout still bounds a genuinely stuck delete. This fully closes the false-failure class rather than narrowing it.

Alternative considered: keep the fail-fast error. Rejected because the most likely cause of "still present" here is lag, not a real failure, so failing fast reintroduces the bug in a smaller window.

What changed

aws/resources/iam_service_linked_role.go: the "role still present" branch now logs and returns false, nil to continue polling instead of returning an error. Added a note that the verification requires iam:GetRole.
aws/resources/iam_service_linked_role_test.go: replaced the "still present returns error" test with one that models read-after-write lag (GetRole sees the role once, then NoSuchEntity) and asserts success. Mock GetRole gained a per-call result sequence. Lowercased a throttling error string per review (Go convention).

Illustration

status poll -> NoSuchEntity (task purged)
  -> GetRole
       NoSuchEntity        -> success
       role still visible  -> keep polling (lag), bounded by timeout   <- changed
       other error         -> propagate

Validation

Unit tests: go test ./aws/resources/ -run TestIAMServiceLinkedRoles -v, PASS (5). The lag test takes ~3s, confirming it retried across a poll interval and then succeeded rather than passing trivially.
gofmt, go vet ./aws/resources/, go build ./...: clean (re-run after rebasing onto current master).
Local read-only E2E against phxdevops (087285199408): inspect-aws --resource-type iam-service-linked-role listed ~40 service-linked roles including AWSServiceRoleForOrganizations, AWSServiceRoleForSSO, AWSServiceRoleForSupport, AWSServiceRoleForTrustedAdvisor. List/identify path works with no regression.
Not run: live nuke. The fast-delete race is non-deterministic and destructive in a shared account, so the delete branch is covered by unit tests.

Known limitations / follow-ups

A genuinely stuck "role still present" state now surfaces only at the 5 minute timeout rather than immediately. This matches existing in-progress handling and is the intended tradeoff.

Summary by CodeRabbit

Bug Fixes

Improved IAM role deletion verification – Enhanced the deletion process to gracefully handle temporary inconsistencies when verifying role state, accounting for AWS IAM's eventual consistency. The system now retries rather than immediately failing when encountering transient delays, improving reliability of role deletion operations.

coderabbitai · 2026-06-13T23:11:04Z

Walkthrough

This change updates IAM service-linked role deletion polling to tolerate temporary role visibility after the deletion task record disappears, and refreshes tests and mocks to cover that lag-aware verification path.

Changes

IAM service-linked role deletion polling

Layer / File(s)	Summary
Deletion verification polling `aws/resources/iam_service_linked_role.go`	When `GetServiceLinkedRoleDeletionStatus` returns `NoSuchEntity`, the code now verifies the role with `GetRole` and keeps polling with `false, nil` if the role is still present.
Lag-aware test coverage `aws/resources/iam_service_linked_role_test.go`	The mock now supports sequenced `GetRole` results, the deletion test now expects success when the role briefly remains visible before disappearing, and the throttling message assertion is updated.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

gruntwork-io/cloud-nuke#1143 — The PR implements the described fix by verifying the role with GetRole when deletion status lookup returns NoSuchEntity.

Possibly related PRs

gruntwork-io/cloud-nuke#1147 — Both PRs touch deleteIAMServiceLinkedRole and its tests around GetRole-based verification after a missing deletion status record.

Suggested reviewers

yhakbar
denis256

Poem

A role said, “I’m gone!” but IAM said, “Hmm… maybe.”
So polling got patient, calm, and wavy.
First seen, then gone, in lag’s little dance,
The tests now follow that fleeting chance.
Debug logs wink: “Just wait one more glance.”

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly matches the main change: continuing polling when service-linked-role deletion sees read-after-write lag.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description is detailed and on-topic, but it omits several template sections like TODOs, release notes, and migration guide.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch james/lib-5166-slr-followup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

When the deletion-status poll returns NoSuchEntity, the verify-via-GetRole branch now treats a still-visible role as IAM read-after-write lag and keeps polling rather than failing, consistent with how the in-progress status is handled. AWS has already accepted the deletion at that point, so a persistent "still present" state is implausible and the surrounding timeout still bounds a genuinely stuck delete. Also document the new iam:GetRole requirement in the code comment and lowercase a test error string per review.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@aws/resources/iam_service_linked_role_test.go`:
- Around line 38-46: Guard the GetRole sequence handling in the IAM mock against
a nil getRoleCalls pointer: in the GetRoleErrSeq branch, add a nil check before
dereferencing or incrementing m.getRoleCalls so future tests that set
GetRoleErrSeq without initializing it won’t panic. Update the sequence logic in
the mock method that reads GetRoleErrSeq and advances getRoleCalls to fall back
safely when the counter is unset.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8b44fb8e-6f02-439e-8ba3-eb0058efeaa6

📥 Commits

Reviewing files that changed from the base of the PR and between 2ed0ab0 and 82ea741.

📒 Files selected for processing (2)

aws/resources/iam_service_linked_role.go
aws/resources/iam_service_linked_role_test.go

coderabbitai · 2026-06-13T23:19:22Z

+	err := m.GetRoleErr
+	if len(m.GetRoleErrSeq) > 0 {
+		i := *m.getRoleCalls
+		if i >= len(m.GetRoleErrSeq) {
+			i = len(m.GetRoleErrSeq) - 1
+		}
+		*m.getRoleCalls++
+		err = m.GetRoleErrSeq[i]
+	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard against nil pointer dereference in the GetRole sequence logic.

If a future test sets GetRoleErrSeq without initializing getRoleCalls, line 40 will panic when dereferencing nil. The current test at line 174 correctly initializes it, but the mock should be more defensive.

🛡️ Suggested fix to add a nil check

func (m mockedIAMServiceLinkedRoles) GetRole(ctx context.Context, params *iam.GetRoleInput, optFns ...func(*iam.Options)) (*iam.GetRoleOutput, error) { err := m.GetRoleErr - if len(m.GetRoleErrSeq) > 0 { + if len(m.GetRoleErrSeq) > 0 && m.getRoleCalls != nil { i := *m.getRoleCalls if i >= len(m.GetRoleErrSeq) { i = len(m.GetRoleErrSeq) - 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

err := m.GetRoleErr

if len(m.GetRoleErrSeq) > 0 {

i := *m.getRoleCalls

if i >= len(m.GetRoleErrSeq) {

i = len(m.GetRoleErrSeq) - 1

}

*m.getRoleCalls++

err = m.GetRoleErrSeq[i]

}

err := m.GetRoleErr

if len(m.GetRoleErrSeq) > 0 && m.getRoleCalls != nil {

i := *m.getRoleCalls

if i >= len(m.GetRoleErrSeq) {

i = len(m.GetRoleErrSeq) - 1

}

*m.getRoleCalls++

err = m.GetRoleErrSeq[i]

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@aws/resources/iam_service_linked_role_test.go` around lines 38 - 46, Guard the GetRole sequence handling in the IAM mock against a nil getRoleCalls pointer: in the GetRoleErrSeq branch, add a nil check before dereferencing or incrementing m.getRoleCalls so future tests that set GetRoleErrSeq without initializing it won’t panic. Update the sequence logic in the mock method that reads GetRoleErrSeq and advances getRoleCalls to fall back safely when the counter is unset.

james00012 marked this pull request as ready for review June 13, 2026 23:12

james00012 requested review from denis256 and yhakbar as code owners June 13, 2026 23:12

james00012 changed the base branch from fix-service-linked-role-fast-delete to master June 13, 2026 23:14

james00012 force-pushed the james/lib-5166-slr-followup branch from 47669c0 to 82ea741 Compare June 13, 2026 23:14

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

denis256 approved these changes Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: keep polling service-linked-role delete on read-after-write lag#1149

fix: keep polling service-linked-role delete on read-after-write lag#1149
james00012 wants to merge 1 commit into
masterfrom
james/lib-5166-slr-followup

james00012 commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

james00012 commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Intent

Decision

What changed

Illustration

Validation

Known limitations / follow-ups

Summary by CodeRabbit

Bug Fixes

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

james00012 commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading