Skip to content

Commit b93d623

Browse files
bdchathamclaude
andauthored
feat: plan-centric reconcile foundations (Steps 1-4) (#87)
* feat: plan-centric reconcile foundations (Steps 1-4) Lay the groundwork for making Plan the central reconciliation abstraction across all SeiNode phases. This is the non-behavioral half of the refactor — all new code is additive and existing behavior is unchanged. Step 1: Add TargetPhase/FailedPhase to TaskPlan CRD - Executor sets node phase atomically with plan completion/failure - Both fields are optional; empty means no transition (backward compatible) Step 2: Extract resource generators to internal/noderesource/ - Moves StatefulSet, Service, PVC generation out of the controller package - Breaks the import cycle that would exist between task and controller - Controller now calls noderesource.* directly (no delegate wrappers) Step 3: New controller-side tasks (ensure-data-pvc, apply-statefulset, apply-service) - Fire-and-forget tasks that complete synchronously in Execute - Registered in the task registry, not yet wired into any plan Step 4: Executor handles TargetPhase and convergence plan cleanup - On plan completion: applies TargetPhase to node status in same patch - On plan failure: applies FailedPhase to node status in same patch - Convergence plans (target == current phase): nils the plan on completion - OnPlanComplete/OnPlanFailed callbacks for metrics/events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove callbacks from executor, use Conditions for failure Remove PhaseTransition struct and OnPlanComplete/OnPlanFailed callbacks. The executor directly owns phase transitions via TargetPhase/FailedPhase without needing callback indirection. On plan failure, set a PlanFailed Condition on the node with the failed task type as the Reason and a message describing what went wrong. This follows the standard K8s Conditions pattern already used by monitor tasks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Condition Reason to CamelCase, preserve init plan on completion Two fixes from expert review: 1. PlanFailed Condition Reason now uses "TaskFailed" (CamelCase) instead of the kebab-case task type string. K8s convention requires Reasons to be CamelCase machine-readable tokens. The task type is already in the Message field. 2. nilPlanIfConvergence now captures the pre-mutation phase and compares against it, so only convergence plans (same phase before and after) get nilled. Init plans (Initializing → Running) keep their completed plan visible in status for observability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent d6e20d1 commit b93d623

18 files changed

Lines changed: 726 additions & 262 deletions

api/v1alpha1/seinode_types.go

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -159,8 +159,8 @@ type FailedTaskInfo struct {
159159
MaxRetries int `json:"maxRetries"`
160160
}
161161

162-
// TaskPlan tracks an ordered sequence of sidecar tasks that the controller
163-
// executes to initialize a node.
162+
// TaskPlan tracks an ordered sequence of tasks that the controller
163+
// executes to drive a node toward a target state.
164164
type TaskPlan struct {
165165
// ID is a unique identifier for this plan instance.
166166
// +optional
@@ -172,6 +172,18 @@ type TaskPlan struct {
172172
// Tasks is the ordered list of tasks to execute.
173173
Tasks []PlannedTask `json:"tasks"`
174174

175+
// TargetPhase is the SeiNodePhase the executor sets on the owning
176+
// resource when the plan completes successfully. When empty, the
177+
// executor does not perform a phase transition.
178+
// +optional
179+
TargetPhase SeiNodePhase `json:"targetPhase,omitempty"`
180+
181+
// FailedPhase is the SeiNodePhase the executor sets on the owning
182+
// resource when the plan fails terminally. When empty, the executor
183+
// does not perform a phase transition on failure.
184+
// +optional
185+
FailedPhase SeiNodePhase `json:"failedPhase,omitempty"`
186+
175187
// FailedTaskIndex is the index of the task that caused the plan to fail.
176188
// +optional
177189
FailedTaskIndex *int `json:"failedTaskIndex,omitempty"`

config/crd/sei.io_seinodedeployments.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -864,6 +864,18 @@ spec:
864864
Plan tracks the active group-level task plan (genesis assembly,
865865
deployment, etc.). Nil when no plan is in progress.
866866
properties:
867+
failedPhase:
868+
description: |-
869+
FailedPhase is the SeiNodePhase the executor sets on the owning
870+
resource when the plan fails terminally. When empty, the executor
871+
does not perform a phase transition on failure.
872+
enum:
873+
- Pending
874+
- Initializing
875+
- Running
876+
- Failed
877+
- Terminating
878+
type: string
867879
failedTaskDetail:
868880
description: FailedTaskDetail records diagnostics about the task
869881
that caused the plan to fail.
@@ -905,6 +917,18 @@ spec:
905917
- Complete
906918
- Failed
907919
type: string
920+
targetPhase:
921+
description: |-
922+
TargetPhase is the SeiNodePhase the executor sets on the owning
923+
resource when the plan completes successfully. When empty, the
924+
executor does not perform a phase transition.
925+
enum:
926+
- Pending
927+
- Initializing
928+
- Running
929+
- Failed
930+
- Terminating
931+
type: string
908932
tasks:
909933
description: Tasks is the ordered list of tasks to execute.
910934
items:

config/crd/sei.io_seinodes.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -628,6 +628,18 @@ spec:
628628
Plan tracks the active task sequence for this node. A planner generates
629629
the plan based on the node's current state and conditions.
630630
properties:
631+
failedPhase:
632+
description: |-
633+
FailedPhase is the SeiNodePhase the executor sets on the owning
634+
resource when the plan fails terminally. When empty, the executor
635+
does not perform a phase transition on failure.
636+
enum:
637+
- Pending
638+
- Initializing
639+
- Running
640+
- Failed
641+
- Terminating
642+
type: string
631643
failedTaskDetail:
632644
description: FailedTaskDetail records diagnostics about the task
633645
that caused the plan to fail.
@@ -669,6 +681,18 @@ spec:
669681
- Complete
670682
- Failed
671683
type: string
684+
targetPhase:
685+
description: |-
686+
TargetPhase is the SeiNodePhase the executor sets on the owning
687+
resource when the plan completes successfully. When empty, the
688+
executor does not perform a phase transition.
689+
enum:
690+
- Pending
691+
- Initializing
692+
- Running
693+
- Failed
694+
- Terminating
695+
type: string
672696
tasks:
673697
description: Tasks is the ordered list of tasks to execute.
674698
items:

internal/controller/node/controller.go

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ import (
2020
"sigs.k8s.io/controller-runtime/pkg/predicate"
2121

2222
seiv1alpha1 "github.com/sei-protocol/sei-k8s-controller/api/v1alpha1"
23+
"github.com/sei-protocol/sei-k8s-controller/internal/noderesource"
2324
"github.com/sei-protocol/sei-k8s-controller/internal/planner"
2425
"github.com/sei-protocol/sei-k8s-controller/internal/platform"
2526
"github.com/sei-protocol/sei-k8s-controller/internal/task"
@@ -85,14 +86,18 @@ func (r *SeiNodeReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ct
8586
return ctrl.Result{}, fmt.Errorf("validating spec: %w", err)
8687
}
8788

89+
// TODO: reconcile peers should become a part of a plan if it needs to be performed.
8890
if err := r.reconcilePeers(ctx, node); err != nil {
8991
return ctrl.Result{}, fmt.Errorf("reconciling peers: %w", err)
9092
}
9193

94+
// TODO: this should be a part of a plan and not part of the main reconciliation flow.
9295
if err := r.ensureNodeDataPVC(ctx, node); err != nil {
9396
return ctrl.Result{}, fmt.Errorf("ensuring data PVC: %w", err)
9497
}
9598

99+
// TODO: if the plan becomes the abstraction it should be, then p from planner.ForNode should just take an existing
100+
// plan off the node if one exists, otherwise, create one based on the state it sees.
96101
switch node.Status.Phase {
97102
case "", seiv1alpha1.PhasePending:
98103
return r.reconcilePending(ctx, node, p)
@@ -143,6 +148,8 @@ func (r *SeiNodeReconciler) reconcilePending(ctx context.Context, node *seiv1alp
143148
func (r *SeiNodeReconciler) reconcileInitializing(ctx context.Context, node *seiv1alpha1.SeiNode) (ctrl.Result, error) {
144149
plan := node.Status.Plan
145150

151+
// TODO: This should be a part of the plan, not a special case we need to filter out for the reconciliation before we
152+
// go ahead with the actual plan.
146153
if !planner.NeedsBootstrap(node) || planner.IsBootstrapComplete(plan) {
147154
if err := r.reconcileNodeStatefulSet(ctx, node); err != nil {
148155
return ctrl.Result{}, fmt.Errorf("reconciling statefulset: %w", err)
@@ -157,6 +164,9 @@ func (r *SeiNodeReconciler) reconcileInitializing(ctx context.Context, node *sei
157164
return result, err
158165
}
159166

167+
// TODO: transitioning should be a part of ExecutePlan, I would think. This way, when a plan is done, it has materialized
168+
// the proper state and does some final patch to the node, basically leaving it in a state that other logic can observe and
169+
// act on as well. For instance, if a plan succeeds, the plan knows what state to put it in. Same for if it fails.
160170
if plan.Phase == seiv1alpha1.TaskPlanComplete {
161171
return r.transitionPhase(ctx, node, seiv1alpha1.PhaseRunning)
162172
}
@@ -168,9 +178,12 @@ func (r *SeiNodeReconciler) reconcileInitializing(ctx context.Context, node *sei
168178

169179
// reconcileRunning converges owned resources and handles runtime tasks.
170180
func (r *SeiNodeReconciler) reconcileRunning(ctx context.Context, node *seiv1alpha1.SeiNode) (ctrl.Result, error) {
181+
182+
// TODO: ideally this becomes a task in a plan
171183
if err := r.reconcileNodeStatefulSet(ctx, node); err != nil {
172184
return ctrl.Result{}, fmt.Errorf("reconciling statefulset: %w", err)
173185
}
186+
// TODO: ideally this becomes a task in a plan
174187
if err := r.reconcileNodeService(ctx, node); err != nil {
175188
return ctrl.Result{}, fmt.Errorf("reconciling service: %w", err)
176189
}
@@ -286,7 +299,7 @@ func (r *SeiNodeReconciler) handleNodeDeletion(ctx context.Context, node *seiv1a
286299

287300
func (r *SeiNodeReconciler) deleteNodeDataPVC(ctx context.Context, node *seiv1alpha1.SeiNode) error {
288301
pvc := &corev1.PersistentVolumeClaim{}
289-
err := r.Get(ctx, types.NamespacedName{Name: nodeDataPVCName(node), Namespace: node.Namespace}, pvc)
302+
err := r.Get(ctx, types.NamespacedName{Name: noderesource.DataPVCName(node), Namespace: node.Namespace}, pvc)
290303
if apierrors.IsNotFound(err) {
291304
return nil
292305
}
@@ -297,7 +310,7 @@ func (r *SeiNodeReconciler) deleteNodeDataPVC(ctx context.Context, node *seiv1al
297310
}
298311

299312
func (r *SeiNodeReconciler) ensureNodeDataPVC(ctx context.Context, node *seiv1alpha1.SeiNode) error {
300-
desired := generateNodeDataPVC(node, r.Platform)
313+
desired := noderesource.GenerateDataPVC(node, r.Platform)
301314
if err := ctrl.SetControllerReference(node, desired, r.Scheme); err != nil {
302315
return fmt.Errorf("setting owner reference: %w", err)
303316
}
@@ -311,7 +324,7 @@ func (r *SeiNodeReconciler) ensureNodeDataPVC(ctx context.Context, node *seiv1al
311324
}
312325

313326
func (r *SeiNodeReconciler) reconcileNodeStatefulSet(ctx context.Context, node *seiv1alpha1.SeiNode) error {
314-
desired := generateNodeStatefulSet(node, r.Platform)
327+
desired := noderesource.GenerateStatefulSet(node, r.Platform)
315328
desired.SetGroupVersionKind(appsv1.SchemeGroupVersion.WithKind("StatefulSet"))
316329
if err := ctrl.SetControllerReference(node, desired, r.Scheme); err != nil {
317330
return fmt.Errorf("setting owner reference: %w", err)
@@ -321,7 +334,7 @@ func (r *SeiNodeReconciler) reconcileNodeStatefulSet(ctx context.Context, node *
321334
}
322335

323336
func (r *SeiNodeReconciler) reconcileNodeService(ctx context.Context, node *seiv1alpha1.SeiNode) error {
324-
desired := generateNodeHeadlessService(node)
337+
desired := noderesource.GenerateHeadlessService(node)
325338
desired.SetGroupVersionKind(corev1.SchemeGroupVersion.WithKind("Service"))
326339
if err := ctrl.SetControllerReference(node, desired, r.Scheme); err != nil {
327340
return fmt.Errorf("setting owner reference: %w", err)

internal/controller/node/labels.go

Lines changed: 0 additions & 38 deletions
This file was deleted.

internal/controller/node/plan_execution.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import (
99
"sigs.k8s.io/controller-runtime/pkg/log"
1010

1111
seiv1alpha1 "github.com/sei-protocol/sei-k8s-controller/api/v1alpha1"
12+
"github.com/sei-protocol/sei-k8s-controller/internal/noderesource"
1213
"github.com/sei-protocol/sei-k8s-controller/internal/planner"
1314
"github.com/sei-protocol/sei-k8s-controller/internal/task"
1415
)
@@ -18,7 +19,7 @@ func (r *SeiNodeReconciler) buildSidecarClient(node *seiv1alpha1.SeiNode) task.S
1819
if r.BuildSidecarClientFn != nil {
1920
return r.BuildSidecarClientFn(node)
2021
}
21-
c, err := sidecar.NewSidecarClientFromPodDNS(node.Name, node.Namespace, sidecarPort(node))
22+
c, err := sidecar.NewSidecarClientFromPodDNS(node.Name, node.Namespace, noderesource.SidecarPort(node))
2223
if err != nil {
2324
log.Log.Info("failed to build sidecar client", "node", node.Name, "error", err)
2425
return nil

internal/controller/node/reconciler_test.go

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ import (
1717
"sigs.k8s.io/controller-runtime/pkg/client/fake"
1818

1919
seiv1alpha1 "github.com/sei-protocol/sei-k8s-controller/api/v1alpha1"
20+
"github.com/sei-protocol/sei-k8s-controller/internal/noderesource"
2021
"github.com/sei-protocol/sei-k8s-controller/internal/planner"
2122
"github.com/sei-protocol/sei-k8s-controller/internal/platform/platformtest"
2223
"github.com/sei-protocol/sei-k8s-controller/internal/task"
@@ -194,7 +195,7 @@ func TestNodeReconcile_RunningPhase_UpdatesStatefulSetImage(t *testing.T) {
194195
node.Status.Phase = seiv1alpha1.PhaseRunning
195196

196197
// Pre-create a StatefulSet with the old image.
197-
oldSts := generateNodeStatefulSet(node, platformtest.Config())
198+
oldSts := noderesource.GenerateStatefulSet(node, platformtest.Config())
198199
oldSts.SetGroupVersionKind(appsv1.SchemeGroupVersion.WithKind("StatefulSet"))
199200

200201
r, c := newNodeReconciler(t, node, oldSts)
@@ -500,7 +501,7 @@ func TestNodeDeletion_SnapshotNode_WithoutRetain_DeletesPVC(t *testing.T) {
500501
ObjectMeta: metav1.ObjectMeta{
501502
Name: "data-snap-0",
502503
Namespace: "default",
503-
Labels: resourceLabelsForNode(node),
504+
Labels: noderesource.ResourceLabels(node),
504505
},
505506
}
506507

internal/controller/node/sizing.go

Lines changed: 0 additions & 56 deletions
This file was deleted.
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
package node
2+
3+
import (
4+
corev1 "k8s.io/api/core/v1"
5+
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
6+
7+
seiv1alpha1 "github.com/sei-protocol/sei-k8s-controller/api/v1alpha1"
8+
)
9+
10+
func newGenesisNode(name, namespace string) *seiv1alpha1.SeiNode { //nolint:unparam // test helper designed for reuse
11+
return &seiv1alpha1.SeiNode{
12+
ObjectMeta: metav1.ObjectMeta{Name: name, Namespace: namespace},
13+
Spec: seiv1alpha1.SeiNodeSpec{
14+
ChainID: "sei-test",
15+
Image: "ghcr.io/sei-protocol/seid:latest",
16+
Entrypoint: &seiv1alpha1.EntrypointConfig{
17+
Command: []string{"seid"},
18+
Args: []string{"start"},
19+
},
20+
Validator: &seiv1alpha1.ValidatorSpec{},
21+
Sidecar: &seiv1alpha1.SidecarConfig{Port: 7777},
22+
},
23+
}
24+
}
25+
26+
func newSnapshotNode(name, namespace string) *seiv1alpha1.SeiNode { //nolint:unparam // test helper designed for reuse
27+
return &seiv1alpha1.SeiNode{
28+
ObjectMeta: metav1.ObjectMeta{Name: name, Namespace: namespace},
29+
Spec: seiv1alpha1.SeiNodeSpec{
30+
ChainID: "sei-test",
31+
Image: "ghcr.io/sei-protocol/seid:latest",
32+
FullNode: &seiv1alpha1.FullNodeSpec{
33+
Snapshot: &seiv1alpha1.SnapshotSource{
34+
S3: &seiv1alpha1.S3SnapshotSource{
35+
TargetHeight: 100000000,
36+
},
37+
TrustPeriod: "9999h0m0s",
38+
},
39+
},
40+
Sidecar: &seiv1alpha1.SidecarConfig{Port: 7777},
41+
},
42+
}
43+
}
44+
45+
func findContainer(containers []corev1.Container, name string) *corev1.Container { //nolint:unparam // test helper designed for reuse
46+
for i := range containers {
47+
if containers[i].Name == name {
48+
return &containers[i]
49+
}
50+
}
51+
return nil
52+
}

0 commit comments

Comments
 (0)