fix(tern): finalize applies on the rollout-projected state, not per-op engine result by Kiran01bm · Pull Request #347 · block/schemabot

Kiran01bm · 2026-06-15T05:08:04Z

What

Apply-level terminal side-effects — stamping completed_at, dropping the
active-applies metric, completing pending stop requests, tearing down observers,
and stopping polling — now gate on the rollout-projected apply state
(deriveAggregateApplyState over all operation rows) rather than the current
operation's engine result. Extracted into a small applyQuiesceDecision helper
used by both the grouped driver and the progress writer.

Why

Under on_failure=continue, one operation can reach a terminal engine result
(e.g. failed) while the rollout projection legitimately holds the apply running
because siblings are still in flight. Keying finalization off the per-operation
result would prematurely stamp completed_at on a running apply, double-drop the
active-applies metric, tear down observers, and stop polling before siblings
finish. Gating on the projection fixes that and keeps a completed last
operation from terminalizing a failed-projected parent.

This is a no-op for single-operation applies: the projection collapses to the
current operation's derived state, so today's behavior is unchanged. It only
takes effect once an apply owns more than one operation.

Stacking / sequencing

Branches off main (its tern files are disjoint from the operation-state-first
persistence work in #343), but it pairs with that change and should land after
it. The next slice — making the projected apply-state writes compare-and-swap so
concurrent sibling drives stop last-write-wins clobbering — stacks on this branch.

…p engine result Apply-level terminal side-effects (completed_at, active-applies metric, pending stop completion, observer teardown, stop-polling) now gate on the rollout projection over all operation rows instead of the current operation's engine result, so an on_failure=continue rollout holds the apply active until siblings settle. No-op for single-operation applies.

Copilot

Pull request overview

This PR adjusts Tern’s apply finalization logic so apply-level terminal side-effects (e.g., stamping completed_at, stopping polling, observer teardown) are gated on the rollout-projected apply state derived across sibling operations, rather than the current operation’s engine result. This prevents premature “terminalization” under on_failure=continue when one sibling reaches a terminal engine state while others are still in flight.

Changes:

Switch progress polling and grouped driver quiescing decisions to use the rollout-projected apply state (deriveAggregateApplyState), via a shared applyQuiesceDecision helper.
Remove taskStateToApplyState and update documentation to reflect that apply state is derived by aggregation rather than single-task mapping.
Add unit coverage for applyQuiesceDecision.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pkg/tern/state_converters.go	Removes now-obsolete single-task → apply-state helper.
pkg/tern/local_client.go	Updates `Progress` to terminalize/quiesce based on rollout projection and shared helper.
pkg/tern/local_apply_grouped.go	Introduces `applyQuiesceDecision` and gates grouped driver side-effects on projected apply state.
pkg/tern/apply_states_test.go	Adds unit test coverage for `applyQuiesceDecision`.
pkg/state/README.md	Updates state-derivation documentation after removing `taskStateToApplyState`.

Comments suppressed due to low confidence (1)

pkg/tern/local_apply_grouped.go:677

When the projected apply state is STOPPED, completed_at should remain nil (STOPPED is resumable). This block currently stamps completed_at for all quiesced states except failed_retryable, which conflicts with markApplyStopped() and sequential finalization behavior.

		if retryableFailure {
			apply.CompletedAt = nil
		} else {
			apply.CompletedAt = &now
		}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…re message from task rows Stopped is terminal but resumable, so quiescing must not stamp completed_at. Gate stamping on a dedicated resumable check. Under on_failure=continue the rollout projection can resolve an apply to a failure via a sibling while the finishing operation's engine result is non-failed, leaving the apply persisted as failed with no message; fall back to the failed task rows in that case.

chatgpt-codex-connector · 2026-06-15T06:28:41Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

aparajon · 2026-06-15T11:52:02Z

I think this still needs one more split before merge: “the claimed operation is done” and “the aggregate apply should quiesce” are separate decisions.

In handleAtomicProgressTick, the current operation’s tasks can derive completed/failed while deriveAggregateApplyState correctly keeps the parent apply running because a sibling is still pending/running under on_failure: continue. Gating the return/stop-polling decision only on applyQuiesceDecision(apply.State) means the current operation drive can keep polling even though its own operation is done.

Suggested shape:

derive the current operation state from the scoped task set;
if the current operation is terminal / retry-paused, return from this operation drive so the operator can persist the claimed apply_operation row;
only run apply-level side effects (completed_at, active-apply metric, terminal observer, stop request completion) when the rollout-projected aggregate apply state quiesces.

The followup in #343 addresses the operator-side “mark operation from its own tasks” bridge, but it still depends on this drive returning when the current operation has settled. A focused test for “current op completed/failed, sibling pending, aggregate parent remains running” would cover the case.

This comment was generated by Amp.

aparajon · 2026-06-15T11:52:17Z

I’m also worried about the Progress path using the whole apply’s task set for aggregate derivation.

currentApplyTasks := filterTasksByApply(tasks, activeTask.ApplyID) can include tasks from multiple apply_operation_ids once an apply fans out. Passing that mixed set into deriveAggregateApplyState makes applyOperationIDForTasks fail; today that failure is treated the same as the legacy/no-operation-model case and falls back to task-only derivation. That can bypass sibling operation rows and on_failure: continue policy, so a progress poll could clobber the rollout-projected parent state from an ambiguous task set.

I think this path should derive the current operation from tasks scoped to the active task’s apply_operation_id, and deriveAggregateApplyState should distinguish “no operation model” from “mixed/missing operation IDs” so the latter fails closed instead of falling back to task-only derivation.

A focused test would be: one apply with two operation IDs, a failed continue sibling plus a pending/running sibling, then a Progress poll must not task-derive the parent to a terminal state or ignore the sibling operation rows.

This comment was generated by Amp.

Copilot AI review requested due to automatic review settings June 15, 2026 05:08

Copilot started reviewing on behalf of Kiran01bm June 15, 2026 05:08 View session

Copilot AI reviewed Jun 15, 2026

View reviewed changes

Comment thread pkg/tern/local_client.go Outdated

Comment thread pkg/tern/local_client.go

Comment thread pkg/tern/local_apply_grouped.go Outdated

Kiran01bm marked this pull request as ready for review June 15, 2026 06:28

Kiran01bm requested review from aparajon and morgo as code owners June 15, 2026 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tern): finalize applies on the rollout-projected state, not per-op engine result#347

fix(tern): finalize applies on the rollout-projected state, not per-op engine result#347
Kiran01bm wants to merge 2 commits into
mainfrom
kiran01bm/rollout-continue-terminal-side-effects

Kiran01bm commented Jun 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented Jun 15, 2026

Uh oh!

aparajon commented Jun 15, 2026

Uh oh!

aparajon commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Kiran01bm commented Jun 15, 2026

What

Why

Stacking / sequencing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented Jun 15, 2026

Uh oh!

aparajon commented Jun 15, 2026

Uh oh!

aparajon commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants