Skip to content

Commit def4508

Browse files
myakoveyossisegev
andauthored
fix: Avoid race when removing interfaces via NNCP [4.17] (#2628)
* fix: Avoid race when removing interfaces via NNCP [4.17] Removing an interface that was created using an NNCP, is done by editing the same NNCP. This sometimes resulted in a race, in which the NNCP success status actually presented the prvious status, leading to deleting the NNCP before the configuration was completed, leaving hanging interfaces in the cluster nodes, with node native interfaces occupied as the ports of these tests-created interfaces. A recent PR made this failed flow to always occur. This PR aims to assure that the timestamp of the AVAIALBLE status is updated for the recent change (the interface removal) and not the previous change (setup or modification). * Satisfy pre-commit and flake requirements * Now I have a clear picture of the changes: 1. **pyproject.toml**: Fixed `requires-python` → `python` key, updated `packaging` version constraint 2. **tox.ini**: Updated basepython from `python3` to `python3.13` across all test environments 3. **poetry.lock**: Regenerated with Poetry 2.2.1 (adds groups/markers metadata) chore: update Python tooling for Poetry 2.x and Python 3.13 - Fix pyproject.toml: requires-python → python key - Bump packaging dependency to >=24.0 - Update tox.ini basepython to python3.13 - Regenerate poetry.lock with Poetry 2.2.1 format --------- Co-authored-by: Yossi Segev <40713576+yossisegev@users.noreply.github.com> Co-authored-by: Yossi Segev <ysegev@redhat.com>
1 parent 362e5b9 commit def4508

5 files changed

Lines changed: 131 additions & 21 deletions

File tree

.flake8

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ fcn_exclude_functions =
5555
Path,
5656
writelines,
5757
submit,
58+
datetime,
5859

5960
enable-extensions =
6061
FCN,

ocp_resources/node_network_configuration_policy.py

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
11
import re
2+
from datetime import datetime
23

34
from kubernetes.dynamic.exceptions import ConflictError
45

5-
from ocp_resources.constants import TIMEOUT_4MINUTES
6+
from ocp_resources.constants import TIMEOUT_1MINUTE, TIMEOUT_4MINUTES, TIMEOUT_5SEC
67
from ocp_resources.node import Node
78
from ocp_resources.node_network_configuration_enactment import (
89
NodeNetworkConfigurationEnactment,
910
)
1011
from ocp_resources.node_network_state import NodeNetworkState
1112
from ocp_resources.resource import Resource, ResourceEditor
12-
from timeout_sampler import TimeoutExpiredError, TimeoutSampler, TimeoutWatch
13+
from timeout_sampler import TimeoutExpiredError, TimeoutSampler, TimeoutWatch, retry
1314

1415
IPV4_STR = "ipv4"
1516
IPV6_STR = "ipv6"
@@ -325,10 +326,43 @@ def _absent_interface(self):
325326
if self.ports:
326327
self.add_ports()
327328

329+
# The current time-stamp of the NNCP's available status will change after the NNCP is updated, therefore
330+
# it must be fetched and stored before the update, and compared with the new time-stamp after.
331+
initial_success_status_time = self._get_last_successful_transition_time()
328332
ResourceEditor(
329333
patches={self: {"spec": {"desiredState": {"interfaces": self.desired_state["interfaces"]}}}}
330334
).update()
331335

336+
# If the NNCP failed on setup, then its tear-down AVAIALBLE status will necessarily be the first.
337+
if initial_success_status_time:
338+
self._wait_for_nncp_status_update(initial_transition_time=initial_success_status_time)
339+
340+
def _get_last_successful_transition_time(self) -> str | None:
341+
for condition in self.instance.status.conditions:
342+
if (
343+
condition["type"] == self.Conditions.Type.AVAILABLE
344+
and condition["status"] == Resource.Condition.Status.TRUE
345+
and condition["reason"] == self.Conditions.Reason.SUCCESSFULLY_CONFIGURED
346+
):
347+
return condition["lastTransitionTime"]
348+
return None
349+
350+
@retry(
351+
wait_timeout=TIMEOUT_1MINUTE,
352+
sleep=TIMEOUT_5SEC,
353+
)
354+
def _wait_for_nncp_status_update(self, initial_transition_time: str) -> bool:
355+
date_format = "%Y-%m-%dT%H:%M:%SZ"
356+
formatted_initial_transition_time = datetime.strptime(initial_transition_time, date_format)
357+
for condition in self.instance.get("status", {}).get("conditions", []):
358+
if (
359+
condition["type"] == self.Conditions.Type.AVAILABLE
360+
and condition["status"] == Resource.Condition.Status.TRUE
361+
and datetime.strptime(condition["lastTransitionTime"], date_format) > formatted_initial_transition_time
362+
):
363+
return True
364+
return False
365+
332366
@property
333367
def status(self):
334368
for condition in self.instance.status.conditions:

0 commit comments

Comments
 (0)