Skip to content

Commit c771542

Browse files
bdchathamclaude
andauthored
refactor: planner owns condition management, not executor (#92)
Move condition ownership from the executor to the planner. The executor now only mutates plan/task state and phase transitions. The planner sets conditions when creating plans and when observing terminal plans on the next reconcile. - Remove setPlanFailedCondition from executor - Remove ConditionPlanFailed constant from executor - Update failTask doc to clarify planner handles conditions - Update planner doc.go with condition ownership section - Update CLAUDE.md Key Patterns with condition ownership, atomic plan creation, and single-patch model documentation - Update test to assert plan failure details instead of condition Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 31df0e8 commit c771542

4 files changed

Lines changed: 24 additions & 44 deletions

File tree

CLAUDE.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,9 @@ make docker-push IMG=<image> # Push container image
5858
- **Plan-driven reconciliation** — Both controllers use ordered task plans (stored in `.status.plan`) to drive lifecycle. Plans are built by `internal/planner/` (`ResolvePlan` for nodes, `ForGroup` for deployments), executed by `planner.Executor`, with individual tasks in `internal/task/`. The reconcile loop is: `ResolvePlan → persist plan → ExecutePlan`. See `internal/planner/doc.go` for the full plan lifecycle.
5959
- **Init plans** transition nodes from Pending → Running. They include infrastructure tasks (`ensure-data-pvc`, `apply-statefulset`, `apply-service`) followed by sidecar tasks (`configure-genesis`, `config-apply`, etc.).
6060
- **Convergence plans** keep Running nodes in sync. They contain only `apply-statefulset` + `apply-service` and are nilled from status after completion.
61+
- **Atomic plan creation** — New plans are persisted before any tasks execute. The reconciler flushes the plan, then requeues. Execution starts on the next reconcile. This guarantees external observers see the plan before side effects occur.
62+
- **Condition ownership** — The planner owns all condition management on the owning resource. It sets conditions when creating plans (e.g., `NodeUpdateInProgress=True`) and when observing terminal plans (e.g., `NodeUpdateInProgress=False`). The executor does not set conditions — it only mutates plan/task state and phase transitions.
63+
- **Single-patch model** — All status mutations (plan state, conditions, phase, currentImage) accumulate in-memory during a reconcile and are flushed in a single `Status().Patch()` at the end. Tasks mutate owned resources (StatefulSets, Services, PVCs); the executor mutates plan state in-memory; the reconciler flushes once.
6164
- **Resource generators** live in `internal/noderesource/` — pure functions that produce StatefulSets, Services, and PVCs from a SeiNode spec. Used by both the controller and plan tasks.
6265
- **Platform config** is fully environment-driven — all fields in `platform.Config` must be set via env vars (no defaults). See `internal/platform/platform.go` for the full list.
6366
- **Genesis resolution** is handled by the sidecar autonomously: embedded sei-config for well-known chains, S3 fallback at `{SEI_GENESIS_BUCKET}/{chainID}/genesis.json` for custom chains.

internal/controller/node/plan_execution_test.go

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -732,18 +732,8 @@ func TestExecutePlan_TaskFailure_SetsPlanFailedCondition(t *testing.T) {
732732
g.Expect(err).NotTo(HaveOccurred())
733733

734734
g.Expect(node.Status.Plan.Phase).To(Equal(seiv1alpha1.TaskPlanFailed))
735-
736-
// Verify PlanFailed condition was set.
737-
var found bool
738-
for _, cond := range node.Status.Conditions {
739-
if cond.Type == planner.ConditionPlanFailed {
740-
g.Expect(cond.Status).To(Equal(metav1.ConditionTrue))
741-
g.Expect(cond.Reason).To(Equal("TaskFailed"))
742-
g.Expect(cond.Message).To(ContainSubstring("boom"))
743-
found = true
744-
}
745-
}
746-
g.Expect(found).To(BeTrue(), "expected PlanFailed condition on node")
735+
g.Expect(node.Status.Plan.FailedTaskDetail).NotTo(BeNil())
736+
g.Expect(node.Status.Plan.FailedTaskDetail.Error).To(ContainSubstring("boom"))
747737
}
748738

749739
// --- Nil sidecar client handling ---

internal/planner/doc.go

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,23 @@
88
//
99
// 1. Build: ResolvePlan (for nodes) or ForGroup (for deployments) inspects the
1010
// resource's current phase and spec, then builds an appropriate plan.
11-
// 2. Persist: The controller patches the plan into the resource's status.
12-
// 3. Execute: Executor.ExecutePlan drives one task per reconcile, patching
13-
// status after each task completes.
14-
// 4. Complete: When all tasks finish, the executor sets TargetPhase on the
15-
// resource and marks the plan Complete. Convergence plans (where the
16-
// target phase equals the current phase) are nilled out to avoid stale
17-
// data in etcd.
18-
// 5. Fail: If a task fails terminally, the executor sets FailedPhase and a
19-
// PlanFailed condition on the resource for operator observability.
11+
// 2. Persist: The controller flushes the plan into the resource's status.
12+
// Execution does not start until the plan is persisted (atomic creation).
13+
// 3. Execute: Executor.ExecutePlan drives tasks in-memory. Synchronous tasks
14+
// advance in a loop; async tasks return Running for the next reconcile.
15+
// The controller flushes all mutations in a single status patch.
16+
// 4. Complete: When all tasks finish, the executor marks the plan Complete and
17+
// sets TargetPhase. On the next reconcile, the planner observes the terminal
18+
// plan, updates conditions, and builds the next plan if needed.
19+
// 5. Fail: If a task fails terminally, the executor marks the plan Failed and
20+
// sets FailedPhase. On the next reconcile, the planner observes the failure,
21+
// updates conditions, and decides whether to retry.
22+
//
23+
// # Condition Ownership
24+
//
25+
// The planner owns all condition management on the owning resource. It sets
26+
// conditions when creating plans and when observing terminal plans. The executor
27+
// does not set conditions — it only mutates plan/task state and phase transitions.
2028
//
2129
// # Plan Types
2230
//

internal/planner/executor.go

Lines changed: 2 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ import (
66
"fmt"
77
"time"
88

9-
"k8s.io/apimachinery/pkg/api/meta"
109
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
1110
ctrl "sigs.k8s.io/controller-runtime"
1211
"sigs.k8s.io/controller-runtime/pkg/client"
@@ -19,9 +18,6 @@ import (
1918
const (
2019
TaskPollInterval = 5 * time.Second
2120
maxRetryBackoff = 30 * time.Second
22-
23-
// ConditionPlanFailed is set on the node when a plan fails terminally.
24-
ConditionPlanFailed = "PlanFailed"
2521
)
2622

2723
// ResultRequeueImmediate requests an immediate re-enqueue with a minimal
@@ -197,8 +193,8 @@ func advanceTask(
197193
}
198194

199195
// failTask marks an individual task and the overall plan as Failed.
200-
// Sets FailedPhase on the node and a PlanFailed condition. All mutations
201-
// are in-memory.
196+
// Sets FailedPhase on the node. The planner handles condition updates
197+
// when it observes the failed plan on the next reconcile.
202198
func failTask(
203199
ctx context.Context,
204200
obj client.Object,
@@ -215,7 +211,6 @@ func failTask(
215211
t.Error = errMsg
216212
plan.Phase = seiv1alpha1.TaskPlanFailed
217213
setTargetPhase(obj, plan.FailedPhase)
218-
setPlanFailedCondition(obj, plan, t, errMsg)
219214
for i := range plan.Tasks {
220215
if plan.Tasks[i].ID == t.ID {
221216
plan.FailedTaskIndex = &i
@@ -251,22 +246,6 @@ func setTargetPhase(obj client.Object, phase seiv1alpha1.SeiNodePhase) {
251246
}
252247
}
253248

254-
// setPlanFailedCondition sets a PlanFailed condition on the node with
255-
// details about which task failed and why.
256-
func setPlanFailedCondition(obj client.Object, plan *seiv1alpha1.TaskPlan, t *seiv1alpha1.PlannedTask, errMsg string) {
257-
node, ok := obj.(*seiv1alpha1.SeiNode)
258-
if !ok {
259-
return
260-
}
261-
meta.SetStatusCondition(&node.Status.Conditions, metav1.Condition{
262-
Type: ConditionPlanFailed,
263-
Status: metav1.ConditionTrue,
264-
Reason: "TaskFailed",
265-
Message: fmt.Sprintf("plan %s failed at task %s: %s", plan.ID, t.Type, errMsg),
266-
ObservedGeneration: node.Generation,
267-
})
268-
}
269-
270249
// currentPhase returns the SeiNode phase, or empty for non-SeiNode objects.
271250
func currentPhase(obj client.Object) seiv1alpha1.SeiNodePhase {
272251
if node, ok := obj.(*seiv1alpha1.SeiNode); ok {

0 commit comments

Comments
 (0)