Skip to content

Commit 34708c1

Browse files
stuggiclaude
andcommitted
Fix MinorUpdateOVNDataplane race when OVN image unchanged between versions
Commit 63fd705 removed the nodeset.IsReady() check from DataplaneNodesetsOVNControllerImagesMatch to fix the minor update workflow getting stuck when unrelated deployments were running. That check was too strict — it blocked when any deployment was in progress, even if the OVN update had already completed. However, the remaining pure image comparison is too loose. When the OVN controller image does not change between two OpenStack versions, the nodeset already has the matching image from the previous update. The OpenStackVersion controller then sets MinorUpdateOVNDataplane=True immediately, before the edpm-ovn-update deployment finishes. This causes the subsequent minor update steps (controlplane update, edpm services update) to proceed while the OVN dataplane deployment is still running — resulting in both dataplane deployments running concurrently. Fix this with two mechanisms: 1. IsDataplaneDeploymentRunningForContainerImage: a new generic function that lists all in-progress OpenStackDataPlaneDeployment resources, resolves which services each deploys (from ServicesOverride or the nodeset's service list), and inspects each service's ContainerImageFields to determine if it manages the specified container image. This does not rely on deployment or service naming conventions and works with custom services. 2. Saved condition state tracking: when the OVN image is unchanged between the deployed and target versions, image comparison alone cannot distinguish "already updated" from "not yet updated." To handle this, we use the saved condition state (captured before Init resets conditions each reconciliation) to track whether a running OVN deployment has been observed during this update cycle: - Running OVN deployment seen → set condition False(RequestedReason) - No running deployment + previous condition was RequestedReason → deployment completed, proceed - No running deployment + previous condition was NOT RequestedReason → deployment not created yet, keep waiting The OpenStackVersion controller watches nodesets, so when a deployment starts (nodeset becomes not-Ready) and completes (nodeset becomes Ready), reconciliation is triggered, ensuring we observe the running deployment at least once. When the OVN image differs between versions, the existing image match check is sufficient — the nodeset's ContainerImages are only updated on successful deployment completion, so a match proves the deployment ran. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
1 parent f987509 commit 34708c1

2 files changed

Lines changed: 149 additions & 0 deletions

File tree

internal/controller/core/openstackversion_controller.go

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,75 @@ func (r *OpenStackVersionReconciler) Reconcile(ctx context.Context, req ctrl.Req
291291
Log.Info("Waiting on OVN Dataplane updates to complete")
292292
return ctrl.Result{}, nil
293293
}
294+
295+
// When the OVN controller image is the same between the deployed
296+
// version and the target version, the image comparison above always
297+
// passes because the nodeset already has the matching image from
298+
// the previous update. In this case we need additional checks to
299+
// confirm the OVN dataplane deployment for this update cycle has
300+
// actually completed.
301+
//
302+
// We use the saved condition state (from before Init reset) to
303+
// track whether we have observed a running OVN deployment during
304+
// this update cycle:
305+
// - If we see a running OVN deployment now: set condition False
306+
// (RequestedReason) to record that we observed one
307+
// - If no running OVN deployment AND the previous condition was
308+
// False/RequestedReason: the deployment we saw previously has
309+
// completed → proceed (fall through to set True)
310+
// - If no running OVN deployment AND the previous condition was
311+
// NOT False/RequestedReason (e.g. still Unknown from Init):
312+
// we haven't seen a deployment yet → keep waiting
313+
//
314+
// When the image differs between versions, the image match alone
315+
// is sufficient proof that a deployment updated it, since the
316+
// nodeset's ContainerImages are only set on successful completion.
317+
deployedDefaults, hasDeployedDefaults := instance.Status.ContainerImageVersionDefaults[*instance.Status.DeployedVersion]
318+
if hasDeployedDefaults &&
319+
deployedDefaults.OvnControllerImage != nil &&
320+
instance.Status.ContainerImages.OvnControllerImage != nil &&
321+
*deployedDefaults.OvnControllerImage == *instance.Status.ContainerImages.OvnControllerImage {
322+
323+
ovnDeploymentRunning, err := openstack.IsDataplaneDeploymentRunningForContainerImage(
324+
ctx, versionHelper, instance.Namespace, dataplaneNodesets, "OvnControllerImage")
325+
if err != nil {
326+
return ctrl.Result{}, err
327+
}
328+
329+
if ovnDeploymentRunning {
330+
// OVN deployment is actively running — record this in
331+
// the condition so we can detect its completion later.
332+
instance.Status.Conditions.Set(condition.FalseCondition(
333+
corev1beta1.OpenStackVersionMinorUpdateOVNDataplane,
334+
condition.RequestedReason,
335+
condition.SeverityInfo,
336+
corev1beta1.OpenStackVersionMinorUpdateReadyRunningMessage))
337+
Log.Info("Waiting on OVN Dataplane deployment to complete (OVN image unchanged between versions)")
338+
return ctrl.Result{}, nil
339+
}
340+
341+
// No OVN deployment running. Check the saved condition state
342+
// from the previous reconciliation to determine if we ever
343+
// observed one running during this update cycle.
344+
prevOvnDataplaneCond := savedConditions.Get(corev1beta1.OpenStackVersionMinorUpdateOVNDataplane)
345+
if prevOvnDataplaneCond == nil ||
346+
prevOvnDataplaneCond.Reason != condition.RequestedReason {
347+
// We have never observed a running OVN deployment in
348+
// this update cycle — the deployment has not been
349+
// created yet. Keep waiting.
350+
instance.Status.Conditions.Set(condition.FalseCondition(
351+
corev1beta1.OpenStackVersionMinorUpdateOVNDataplane,
352+
condition.InitReason,
353+
condition.SeverityInfo,
354+
corev1beta1.OpenStackVersionMinorUpdateReadyRunningMessage))
355+
Log.Info("Waiting for OVN Dataplane deployment to be created (OVN image unchanged between versions)")
356+
return ctrl.Result{}, nil
357+
}
358+
// Previously saw a running OVN deployment (condition was
359+
// False/RequestedReason), now no OVN deployment is running
360+
// → the deployment has completed. Fall through to set True.
361+
Log.Info("OVN Dataplane deployment completed (OVN image unchanged between versions)")
362+
}
294363
}
295364
instance.Status.Conditions.MarkTrue(
296365
corev1beta1.OpenStackVersionMinorUpdateOVNDataplane,

internal/openstack/dataplane.go

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,13 @@ package openstack
22

33
import (
44
"context"
5+
"slices"
56

67
"github.com/openstack-k8s-operators/lib-common/modules/common/helper"
78
corev1beta1 "github.com/openstack-k8s-operators/openstack-operator/api/core/v1beta1"
89

910
dataplanev1 "github.com/openstack-k8s-operators/openstack-operator/api/dataplane/v1beta1"
11+
"k8s.io/apimachinery/pkg/types"
1012
"sigs.k8s.io/controller-runtime/pkg/client"
1113
)
1214

@@ -58,6 +60,84 @@ func DataplaneNodesetsOVNControllerImagesMatch(version *corev1beta1.OpenStackVer
5860
return true
5961
}
6062

63+
// IsDataplaneDeploymentRunningForContainerImage checks whether any in-progress
64+
// OpenStackDataPlaneDeployment is deploying a service that manages the given
65+
// containerImageField (e.g. "OvnControllerImage"). It resolves which services
66+
// each deployment runs (from ServicesOverride or the nodeset's service list)
67+
// and inspects the service's ContainerImageFields to determine if it manages
68+
// the specified container image.
69+
func IsDataplaneDeploymentRunningForContainerImage(
70+
ctx context.Context,
71+
h *helper.Helper,
72+
namespace string,
73+
dataplaneNodesets *dataplanev1.OpenStackDataPlaneNodeSetList,
74+
containerImageField string,
75+
) (bool, error) {
76+
// List all deployments in the namespace
77+
deployments := &dataplanev1.OpenStackDataPlaneDeploymentList{}
78+
opts := []client.ListOption{
79+
client.InNamespace(namespace),
80+
}
81+
err := h.GetClient().List(ctx, deployments, opts...)
82+
if err != nil {
83+
return false, err
84+
}
85+
86+
// Build a map of nodeset name -> nodeset for quick lookup
87+
nodesetMap := make(map[string]*dataplanev1.OpenStackDataPlaneNodeSet, len(dataplaneNodesets.Items))
88+
for i := range dataplaneNodesets.Items {
89+
nodesetMap[dataplaneNodesets.Items[i].Name] = &dataplaneNodesets.Items[i]
90+
}
91+
92+
// Cache service lookups to avoid repeated API calls
93+
serviceCache := make(map[string]*dataplanev1.OpenStackDataPlaneService)
94+
95+
for _, deployment := range deployments.Items {
96+
// Skip completed deployments
97+
if deployment.Status.Deployed {
98+
continue
99+
}
100+
101+
// Determine which services this deployment runs for each of its nodesets
102+
for _, nodesetName := range deployment.Spec.NodeSets {
103+
nodeset, exists := nodesetMap[nodesetName]
104+
if !exists || len(nodeset.Spec.Nodes) == 0 {
105+
continue
106+
}
107+
108+
var services []string
109+
if len(deployment.Spec.ServicesOverride) != 0 {
110+
services = deployment.Spec.ServicesOverride
111+
} else {
112+
services = nodeset.Spec.Services
113+
}
114+
115+
for _, serviceName := range services {
116+
svc, cached := serviceCache[serviceName]
117+
if !cached {
118+
foundService := &dataplanev1.OpenStackDataPlaneService{}
119+
err := h.GetClient().Get(ctx, types.NamespacedName{
120+
Name: serviceName,
121+
Namespace: namespace,
122+
}, foundService)
123+
if err != nil {
124+
// Service not found — skip it
125+
continue
126+
}
127+
svc = foundService
128+
serviceCache[serviceName] = svc
129+
}
130+
131+
if slices.Contains(svc.Spec.ContainerImageFields, containerImageField) {
132+
return true, nil
133+
}
134+
}
135+
}
136+
}
137+
138+
return false, nil
139+
}
140+
61141
// DataplaneNodesetsDeployed returns true if all nodesets are deployed with the latest version
62142
func DataplaneNodesetsDeployed(version *corev1beta1.OpenStackVersion, dataplaneNodesets *dataplanev1.OpenStackDataPlaneNodeSetList) bool {
63143
for _, nodeset := range dataplaneNodesets.Items {

0 commit comments

Comments
 (0)