fix(operator): halt remaining siblings on stop under continue#349
Open
Kiran01bm wants to merge 2 commits into
Open
fix(operator): halt remaining siblings on stop under continue#349Kiran01bm wants to merge 2 commits into
Kiran01bm wants to merge 2 commits into
Conversation
When the rollout settles to failed, surface the reason from the first failed operation row instead of leaving whatever message the last-driven operation wrote. Under on_failure "continue" the last driver may be a successful sibling, which would otherwise leave the failed verdict with no matching reason.
Under on_failure "continue" a stop did not stop the rollout: the operation claim predicate had no stop check, so a continue-exempted pending sibling was still started. Gate the pending claim on a pending stop, then terminalize the pending siblings so the apply settles instead of stranding running with siblings the gate keeps from ever starting.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Base automatically changed from
kiran01bm/aggregate-error-message
to
kiran01bm/operation-state-first-persistence
June 15, 2026 11:54
aparajon
approved these changes
Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Stacked on #345 (→ #343). Makes a user
stopactually halt a rollout running underon_failure: continue.The bug: under
continue, a failed earlier deployment no longer blocks later siblings — butstopfailed to stop them. The operation claim predicate had no stop check, so a continue-exempted pending sibling was still started after the user asked to stop.The fix:
FindNextApplyOperationSELECT and the pending→running UPDATE). Stale-active recovery is untouched, so an in-flight drive still resumes and observes the stop.stoppedso the aggregate settles terminal, then complete the stop. Driven inline inrecoverApplyOperationwhen an op is being driven, and via a dedicatedrecoverApplyPendingStopclaim path when nothing is active (which would otherwise strand a failed-continue apply).Only
pendingrows are touched (never running/terminal),completed_atstays nil (stopped is resumable), writes are apply-lease guarded. Dormant until multi-deployment fan-out lands (one operation per apply today).