fix(tern): finalize applies on the rollout-projected state, not per-op engine result#347
fix(tern): finalize applies on the rollout-projected state, not per-op engine result#347Kiran01bm wants to merge 2 commits into
Conversation
…p engine result Apply-level terminal side-effects (completed_at, active-applies metric, pending stop completion, observer teardown, stop-polling) now gate on the rollout projection over all operation rows instead of the current operation's engine result, so an on_failure=continue rollout holds the apply active until siblings settle. No-op for single-operation applies.
There was a problem hiding this comment.
Pull request overview
This PR adjusts Tern’s apply finalization logic so apply-level terminal side-effects (e.g., stamping completed_at, stopping polling, observer teardown) are gated on the rollout-projected apply state derived across sibling operations, rather than the current operation’s engine result. This prevents premature “terminalization” under on_failure=continue when one sibling reaches a terminal engine state while others are still in flight.
Changes:
- Switch progress polling and grouped driver quiescing decisions to use the rollout-projected apply state (
deriveAggregateApplyState), via a sharedapplyQuiesceDecisionhelper. - Remove
taskStateToApplyStateand update documentation to reflect that apply state is derived by aggregation rather than single-task mapping. - Add unit coverage for
applyQuiesceDecision.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/tern/state_converters.go | Removes now-obsolete single-task → apply-state helper. |
| pkg/tern/local_client.go | Updates Progress to terminalize/quiesce based on rollout projection and shared helper. |
| pkg/tern/local_apply_grouped.go | Introduces applyQuiesceDecision and gates grouped driver side-effects on projected apply state. |
| pkg/tern/apply_states_test.go | Adds unit test coverage for applyQuiesceDecision. |
| pkg/state/README.md | Updates state-derivation documentation after removing taskStateToApplyState. |
Comments suppressed due to low confidence (1)
pkg/tern/local_apply_grouped.go:677
- When the projected apply state is STOPPED, completed_at should remain nil (STOPPED is resumable). This block currently stamps completed_at for all quiesced states except failed_retryable, which conflicts with markApplyStopped() and sequential finalization behavior.
if retryableFailure {
apply.CompletedAt = nil
} else {
apply.CompletedAt = &now
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…re message from task rows Stopped is terminal but resumable, so quiescing must not stamp completed_at. Gate stamping on a dedicated resumable check. Under on_failure=continue the rollout projection can resolve an apply to a failure via a sibling while the finishing operation's engine result is non-failed, leaving the apply persisted as failed with no message; fall back to the failed task rows in that case.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
I think this still needs one more split before merge: “the claimed operation is done” and “the aggregate apply should quiesce” are separate decisions. In Suggested shape:
The followup in #343 addresses the operator-side “mark operation from its own tasks” bridge, but it still depends on this drive returning when the current operation has settled. A focused test for “current op completed/failed, sibling pending, aggregate parent remains running” would cover the case. This comment was generated by Amp. |
|
I’m also worried about the
I think this path should derive the current operation from tasks scoped to the active task’s A focused test would be: one apply with two operation IDs, a failed This comment was generated by Amp. |
What
Apply-level terminal side-effects — stamping
completed_at, dropping theactive-applies metric, completing pending stop requests, tearing down observers,
and stopping polling — now gate on the rollout-projected apply state
(
deriveAggregateApplyStateover all operation rows) rather than the currentoperation's engine result. Extracted into a small
applyQuiesceDecisionhelperused by both the grouped driver and the progress writer.
Why
Under
on_failure=continue, one operation can reach a terminal engine result(e.g. failed) while the rollout projection legitimately holds the apply running
because siblings are still in flight. Keying finalization off the per-operation
result would prematurely stamp
completed_aton a running apply, double-drop theactive-applies metric, tear down observers, and stop polling before siblings
finish. Gating on the projection fixes that and keeps a
completedlastoperation from terminalizing a
failed-projected parent.This is a no-op for single-operation applies: the projection collapses to the
current operation's derived state, so today's behavior is unchanged. It only
takes effect once an apply owns more than one operation.
Stacking / sequencing
Branches off
main(its tern files are disjoint from the operation-state-firstpersistence work in #343), but it pairs with that change and should land after
it. The next slice — making the projected apply-state writes compare-and-swap so
concurrent sibling drives stop last-write-wins clobbering — stacks on this branch.