Skip to content

Commit f36c136

Browse files
stuggiclaude
andcommitted
Fix MinorUpdateOVNDataplane race when OVN image unchanged between versions
Commit 63fd705 removed the nodeset.IsReady() check from DataplaneNodesetsOVNControllerImagesMatch to fix the minor update workflow getting stuck when unrelated deployments were running. That check was too strict — it blocked when any deployment was in progress, even if the OVN update had already completed. However, the remaining pure image comparison is too loose. When the OVN controller image does not change between two OpenStack versions, the nodeset already has the matching image from the previous update. The OpenStackVersion controller then sets MinorUpdateOVNDataplane=True immediately, before the edpm-ovn-update deployment finishes. This causes the subsequent minor update steps (controlplane update, edpm services update) to proceed while the OVN dataplane deployment is still running — resulting in both dataplane deployments running concurrently. Fix this with two mechanisms: 1. IsDataplaneDeploymentRunningForServiceType: a new generic function that lists all in-progress OpenStackDataPlaneDeployment resources, resolves which services each deploys (from ServicesOverride or the nodeset's service list), and checks the service's EDPMServiceType to identify deployments of a given type (e.g. "ovn"). EDPMServiceType is a fixed identifier on the service spec, independent of the service or deployment resource name. 2. Saved condition state tracking: when the OVN image is unchanged between the deployed and target versions, image comparison alone cannot distinguish "already updated" from "not yet updated." To handle this, we use the saved condition state (captured before Init resets conditions each reconciliation) to track whether a running OVN deployment has been observed during this update cycle: - Running OVN deployment seen → set condition False(RequestedReason) - No running deployment + previous condition was RequestedReason → deployment completed, proceed - No running deployment + previous condition was NOT RequestedReason → deployment not created yet, keep waiting The OpenStackVersion controller watches nodesets, so when a deployment starts (nodeset becomes not-Ready) and completes (nodeset becomes Ready), reconciliation is triggered, ensuring we observe the running deployment at least once. When the OVN image differs between versions, the existing image match check is sufficient — the nodeset's ContainerImages are only updated on successful deployment completion, so a match proves the deployment ran. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
1 parent f987509 commit f36c136

2 files changed

Lines changed: 147 additions & 0 deletions

File tree

internal/controller/core/openstackversion_controller.go

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,75 @@ func (r *OpenStackVersionReconciler) Reconcile(ctx context.Context, req ctrl.Req
291291
Log.Info("Waiting on OVN Dataplane updates to complete")
292292
return ctrl.Result{}, nil
293293
}
294+
295+
// When the OVN controller image is the same between the deployed
296+
// version and the target version, the image comparison above always
297+
// passes because the nodeset already has the matching image from
298+
// the previous update. In this case we need additional checks to
299+
// confirm the OVN dataplane deployment for this update cycle has
300+
// actually completed.
301+
//
302+
// We use the saved condition state (from before Init reset) to
303+
// track whether we have observed a running OVN deployment during
304+
// this update cycle:
305+
// - If we see a running OVN deployment now: set condition False
306+
// (RequestedReason) to record that we observed one
307+
// - If no running OVN deployment AND the previous condition was
308+
// False/RequestedReason: the deployment we saw previously has
309+
// completed → proceed (fall through to set True)
310+
// - If no running OVN deployment AND the previous condition was
311+
// NOT False/RequestedReason (e.g. still Unknown from Init):
312+
// we haven't seen a deployment yet → keep waiting
313+
//
314+
// When the image differs between versions, the image match alone
315+
// is sufficient proof that a deployment updated it, since the
316+
// nodeset's ContainerImages are only set on successful completion.
317+
deployedDefaults, hasDeployedDefaults := instance.Status.ContainerImageVersionDefaults[*instance.Status.DeployedVersion]
318+
if hasDeployedDefaults &&
319+
deployedDefaults.OvnControllerImage != nil &&
320+
instance.Status.ContainerImages.OvnControllerImage != nil &&
321+
*deployedDefaults.OvnControllerImage == *instance.Status.ContainerImages.OvnControllerImage {
322+
323+
ovnDeploymentRunning, err := openstack.IsDataplaneDeploymentRunningForServiceType(
324+
ctx, versionHelper, instance.Namespace, dataplaneNodesets, "ovn")
325+
if err != nil {
326+
return ctrl.Result{}, err
327+
}
328+
329+
if ovnDeploymentRunning {
330+
// OVN deployment is actively running — record this in
331+
// the condition so we can detect its completion later.
332+
instance.Status.Conditions.Set(condition.FalseCondition(
333+
corev1beta1.OpenStackVersionMinorUpdateOVNDataplane,
334+
condition.RequestedReason,
335+
condition.SeverityInfo,
336+
corev1beta1.OpenStackVersionMinorUpdateReadyRunningMessage))
337+
Log.Info("Waiting on OVN Dataplane deployment to complete (OVN image unchanged between versions)")
338+
return ctrl.Result{}, nil
339+
}
340+
341+
// No OVN deployment running. Check the saved condition state
342+
// from the previous reconciliation to determine if we ever
343+
// observed one running during this update cycle.
344+
prevOvnDataplaneCond := savedConditions.Get(corev1beta1.OpenStackVersionMinorUpdateOVNDataplane)
345+
if prevOvnDataplaneCond == nil ||
346+
prevOvnDataplaneCond.Reason != condition.RequestedReason {
347+
// We have never observed a running OVN deployment in
348+
// this update cycle — the deployment has not been
349+
// created yet. Keep waiting.
350+
instance.Status.Conditions.Set(condition.FalseCondition(
351+
corev1beta1.OpenStackVersionMinorUpdateOVNDataplane,
352+
condition.InitReason,
353+
condition.SeverityInfo,
354+
corev1beta1.OpenStackVersionMinorUpdateReadyRunningMessage))
355+
Log.Info("Waiting for OVN Dataplane deployment to be created (OVN image unchanged between versions)")
356+
return ctrl.Result{}, nil
357+
}
358+
// Previously saw a running OVN deployment (condition was
359+
// False/RequestedReason), now no OVN deployment is running
360+
// → the deployment has completed. Fall through to set True.
361+
Log.Info("OVN Dataplane deployment completed (OVN image unchanged between versions)")
362+
}
294363
}
295364
instance.Status.Conditions.MarkTrue(
296365
corev1beta1.OpenStackVersionMinorUpdateOVNDataplane,

internal/openstack/dataplane.go

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ import (
77
corev1beta1 "github.com/openstack-k8s-operators/openstack-operator/api/core/v1beta1"
88

99
dataplanev1 "github.com/openstack-k8s-operators/openstack-operator/api/dataplane/v1beta1"
10+
"k8s.io/apimachinery/pkg/types"
1011
"sigs.k8s.io/controller-runtime/pkg/client"
1112
)
1213

@@ -58,6 +59,83 @@ func DataplaneNodesetsOVNControllerImagesMatch(version *corev1beta1.OpenStackVer
5859
return true
5960
}
6061

62+
// IsDataplaneDeploymentRunningForServiceType checks whether any in-progress
63+
// OpenStackDataPlaneDeployment is deploying a service with the given
64+
// EDPMServiceType (e.g. "ovn"). It resolves which services each deployment
65+
// runs (from ServicesOverride or the nodeset's service list) and inspects
66+
// the service's EDPMServiceType to determine if it matches.
67+
func IsDataplaneDeploymentRunningForServiceType(
68+
ctx context.Context,
69+
h *helper.Helper,
70+
namespace string,
71+
dataplaneNodesets *dataplanev1.OpenStackDataPlaneNodeSetList,
72+
serviceType string,
73+
) (bool, error) {
74+
// List all deployments in the namespace
75+
deployments := &dataplanev1.OpenStackDataPlaneDeploymentList{}
76+
opts := []client.ListOption{
77+
client.InNamespace(namespace),
78+
}
79+
err := h.GetClient().List(ctx, deployments, opts...)
80+
if err != nil {
81+
return false, err
82+
}
83+
84+
// Build a map of nodeset name -> nodeset for quick lookup
85+
nodesetMap := make(map[string]*dataplanev1.OpenStackDataPlaneNodeSet, len(dataplaneNodesets.Items))
86+
for i := range dataplaneNodesets.Items {
87+
nodesetMap[dataplaneNodesets.Items[i].Name] = &dataplaneNodesets.Items[i]
88+
}
89+
90+
// Cache service lookups to avoid repeated API calls
91+
serviceCache := make(map[string]*dataplanev1.OpenStackDataPlaneService)
92+
93+
for _, deployment := range deployments.Items {
94+
// Skip completed deployments
95+
if deployment.Status.Deployed {
96+
continue
97+
}
98+
99+
// Determine which services this deployment runs for each of its nodesets
100+
for _, nodesetName := range deployment.Spec.NodeSets {
101+
nodeset, exists := nodesetMap[nodesetName]
102+
if !exists || len(nodeset.Spec.Nodes) == 0 {
103+
continue
104+
}
105+
106+
var services []string
107+
if len(deployment.Spec.ServicesOverride) != 0 {
108+
services = deployment.Spec.ServicesOverride
109+
} else {
110+
services = nodeset.Spec.Services
111+
}
112+
113+
for _, serviceName := range services {
114+
svc, cached := serviceCache[serviceName]
115+
if !cached {
116+
foundService := &dataplanev1.OpenStackDataPlaneService{}
117+
err := h.GetClient().Get(ctx, types.NamespacedName{
118+
Name: serviceName,
119+
Namespace: namespace,
120+
}, foundService)
121+
if err != nil {
122+
// Service not found — skip it
123+
continue
124+
}
125+
svc = foundService
126+
serviceCache[serviceName] = svc
127+
}
128+
129+
if svc.Spec.EDPMServiceType == serviceType {
130+
return true, nil
131+
}
132+
}
133+
}
134+
}
135+
136+
return false, nil
137+
}
138+
61139
// DataplaneNodesetsDeployed returns true if all nodesets are deployed with the latest version
62140
func DataplaneNodesetsDeployed(version *corev1beta1.OpenStackVersion, dataplaneNodesets *dataplanev1.OpenStackDataPlaneNodeSetList) bool {
63141
for _, nodeset := range dataplaneNodesets.Items {

0 commit comments

Comments
 (0)