Skip to content

Commit 4c98217

Browse files
committed
Add efficient pod completion wait to hotloop role
To avoid long waits when tempest tests fail. - Add hotloop_wait_pod_completion module for immediate failure detection - Replace 60m timeout with smart polling that exits on Success/Failure Assisted-By: claude-4-sonnet
1 parent a8936e1 commit 4c98217

6 files changed

Lines changed: 381 additions & 1 deletion

File tree

docs/hotloop_lang.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,16 @@ Here's a breakdown of the common attributes within a stage:
9292
are typically `oc wait` commands in the context of OpenShift, ensuring that
9393
resources are created, become ready, or reach a desired state before the
9494
pipeline proceeds. Each item in the list is a command-line string.
95+
- **`wait_pod_completion`**: (Optional) A list of pod completion wait configurations
96+
that efficiently wait for a single pod to reach terminal states (Succeeded or Failed).
97+
This provides faster failure detection compared to traditional `oc wait` commands
98+
with long timeouts. Each item must define:
99+
- **`namespace`**: The Kubernetes namespace to search for pods.
100+
- **`labels`**: Label selectors to identify the pod to wait for. Must match
101+
exactly one pod.
102+
- **`timeout`**: (Optional) Maximum time to wait in seconds. Defaults to 3600.
103+
- **`poll_interval`**: (Optional) Interval between status checks in seconds.
104+
Defaults to 10.
95105
- **`run_conditions`**: (Optional) A list of conditions that must be met for a
96106
stage to execute. Strings `False`, `FALSE` and `false` will be evaluated as
97107
`False`, otherwise the python boolean equivalent of the value.
@@ -235,6 +245,14 @@ customize the manifest content dynamically.
235245
- >-
236246
oc wait -n metallb-system pod -l component=speaker --for condition=Ready
237247
--timeout=300s"
248+
wait_pod_completion:
249+
- namespace: openstack
250+
labels:
251+
operator: test-operator
252+
service: tempest
253+
workflowStep: "0"
254+
timeout: 3600
255+
poll_interval: 15
238256
```
239257

240258
Here, the `manifest` stage applies the YAML file located at

docs/hotstack_scenarios.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,11 @@ Role.
8686
command-line tool) to execute. These commands poll the cluster until
8787
a specific condition is met, ensuring that resources are ready before
8888
the pipeline proceeds.
89+
- `wait_pod_completion`: A list of pod completion wait configurations
90+
that efficiently wait for pods to reach terminal states (Succeeded or
91+
Failed). This provides faster failure detection compared to traditional
92+
`oc wait` commands with long timeouts, particularly useful for test
93+
execution pods that may fail early.
8994

9095
### `manifests/` (Directory)
9196

roles/hotloop/README.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,15 @@ Schema for a stage item is:
7272
7373
* `wait_conditions` (list) A list of commands to run after applying the
7474
manifest, i.e `oc wait --for <condition>`
75+
* `wait_pod_completion` (list) A list of pod completion wait configurations
76+
that efficiently wait for a single pod to reach terminal states (Succeeded or Failed).
77+
Each item must define:
78+
* `namespace`: (string) The Kubernetes namespace to search for pods.
79+
* `labels`: (dict) Label selectors to identify the pod to wait for. Must match
80+
exactly one pod.
81+
* `timeout`: (int, optional) Maximum time to wait in seconds. Defaults to 3600.
82+
* `poll_interval`: (int, optional) Interval between status checks in seconds.
83+
Defaults to 10.
7584
* `run_conditions` (list) A list of conditions that must be met for a stage
7685
to execute. Strings `False`, `FALSE` and `false` will be evaluated as
7786
`False`, otherwise the python boolean equivalent of the value.
@@ -116,7 +125,7 @@ Schema for a stage item is:
116125

117126
> **_NOTE_**: Stage items are applied the actions in the following order:
118127
> `command` -> `shell` -> `manifest` -> `j2_manifest` -> `kustomize` ->
119-
> `wait_conditions` -> `stages`.
128+
> `wait_conditions` -> `wait_pod_completion` -> `stages`.
120129

121130
Example:
122131

@@ -162,6 +171,25 @@ stages:
162171
- "oc wait -n openstack-operators -l app.kubernetes.io/name=rabbitmq-cluster-operator deployment --for condition=Available --timeout=300s"
163172
- "oc wait -n openstack-operators -l app.kubernetes.io/instance=webhook-service service --for jsonpath='{.status.loadBalancer}' --timeout=300s"
164173

174+
- name: Run tempest tests
175+
documentation: |
176+
Execute comprehensive OpenStack validation tests using the Tempest framework.
177+
Wait for pod completion efficiently without long timeouts on failure.
178+
The label selectors must uniquely identify a single pod.
179+
manifest: tempest-tests.yml
180+
wait_conditions:
181+
- >-
182+
oc wait -n openstack tempests.test.openstack.org tempest-tests
183+
--for condition=ServiceConfigReady --timeout=120s
184+
wait_pod_completion:
185+
- namespace: openstack
186+
labels:
187+
operator: test-operator
188+
service: tempest
189+
workflowStep: "0"
190+
timeout: 3600
191+
poll_interval: 15
192+
165193
- name: Common MetalLB
166194
manifest: ../common/metallb.yaml
167195
wait_conditions:

roles/hotloop/library/hotloop_stage_loader.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@
101101
"shell",
102102
"stages",
103103
"wait_conditions",
104+
"wait_pod_completion",
104105
}
105106

106107
ALLOWED_KUSTOMIZE_KEYS = {
@@ -161,6 +162,72 @@ def _validate_run_conditions(conditions):
161162
)
162163

163164

165+
def _validate_wait_pod_completion(wait_pod_completion_list):
166+
"""Validates the 'wait_pod_completion' parameter.
167+
168+
This function checks if the 'wait_pod_completion' parameter is a list
169+
of properly structured pod completion wait configurations.
170+
171+
:param wait_pod_completion_list: The 'wait_pod_completion' parameter to validate.
172+
"""
173+
if not isinstance(wait_pod_completion_list, list):
174+
raise TypeError(
175+
"'wait_pod_completion' must be a list, got {wait_type}".format(
176+
wait_type=type(wait_pod_completion_list)
177+
)
178+
)
179+
180+
for i, wait_config in enumerate(wait_pod_completion_list):
181+
if not isinstance(wait_config, dict):
182+
raise TypeError(
183+
"wait_pod_completion[{index}] must be a dict, got {config_type}".format(
184+
index=i, config_type=type(wait_config)
185+
)
186+
)
187+
188+
# Check required fields
189+
required_fields = {"namespace", "labels"}
190+
missing_fields = required_fields - wait_config.keys()
191+
if missing_fields:
192+
raise ValueError(
193+
"wait_pod_completion[{index}] missing required fields: {missing}".format(
194+
index=i, missing=missing_fields
195+
)
196+
)
197+
198+
# Validate field types
199+
if not isinstance(wait_config["namespace"], str):
200+
raise TypeError(
201+
"wait_pod_completion[{index}].namespace must be a string, got {ns_type}".format(
202+
index=i, ns_type=type(wait_config["namespace"])
203+
)
204+
)
205+
206+
if not isinstance(wait_config["labels"], dict):
207+
raise TypeError(
208+
"wait_pod_completion[{index}].labels must be a dict, got {labels_type}".format(
209+
index=i, labels_type=type(wait_config["labels"])
210+
)
211+
)
212+
213+
# Validate optional fields
214+
if "timeout" in wait_config and not isinstance(wait_config["timeout"], int):
215+
raise TypeError(
216+
"wait_pod_completion[{index}].timeout must be an integer, got {timeout_type}".format(
217+
index=i, timeout_type=type(wait_config["timeout"])
218+
)
219+
)
220+
221+
if "poll_interval" in wait_config and not isinstance(
222+
wait_config["poll_interval"], int
223+
):
224+
raise TypeError(
225+
"wait_pod_completion[{index}].poll_interval must be an integer, got {poll_type}".format(
226+
index=i, poll_type=type(wait_config["poll_interval"])
227+
)
228+
)
229+
230+
164231
def _validate_kustomize(kustomize_config):
165232
"""Validates the 'kustomize' parameter.
166233
@@ -226,6 +293,11 @@ def _validate_stage(stage, nested=False):
226293
if not isinstance(stage.get("wait_conditions", []), list):
227294
raise ValueError("Wait conditions must be a list, {stage}".format(stage=stage))
228295

296+
if not isinstance(stage.get("wait_pod_completion", []), list):
297+
raise ValueError(
298+
"Wait pod completion must be a list, {stage}".format(stage=stage)
299+
)
300+
229301
if nested and "stages" in stage:
230302
raise ValueError("Nested stages cannot be nested, {stage}".format(stage=stage))
231303

@@ -235,6 +307,9 @@ def _validate_stage(stage, nested=False):
235307
if "kustomize" in stage:
236308
_validate_kustomize(stage["kustomize"])
237309

310+
if "wait_pod_completion" in stage:
311+
_validate_wait_pod_completion(stage["wait_pod_completion"])
312+
238313

239314
def _load_nested(stages):
240315
"""Load and validates nested stages

0 commit comments

Comments
 (0)