Skip to content

fix: concurrent update-core and update-module runs#1176

Merged
DavidePrincipi merged 2 commits into
mainfrom
bug-7877
May 14, 2026
Merged

fix: concurrent update-core and update-module runs#1176
DavidePrincipi merged 2 commits into
mainfrom
bug-7877

Conversation

@DavidePrincipi
Copy link
Copy Markdown
Member

Software update procedures must be protected from concurrent runs to avoid unexpected behaviors, like sudden service restarts.

Use fcntl flock to ensure only one action runs at a time. If the lock cannot be acquired, the action fails immediately with a validation-failed status.

The lock is enforced by the kernel and is implicitly released when the process terminates, no matter its exit code. The rendez-vous file is created only if it not exists (O_CREAT flag).

Refs NethServer/dev#7877

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds protection against concurrent runs of update-core (cluster-level) and update-module (per-module) actions using a non-blocking fcntl.flock on a per-action lock file in AGENT_STATE_DIR. If the lock is already held the action fails immediately with status validation-failed and a structured error payload, relying on the kernel to release the lock on process exit.

Changes:

  • Acquire an exclusive non-blocking flock at the start of the cluster update-core action.
  • Acquire an exclusive non-blocking flock at the start of the agent update-module/05pullimages action.
  • On contention, emit a validation-failed status with an action_already_running error and exit with code 3.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
core/imageroot/var/lib/nethserver/cluster/actions/update-core/50update_core Adds flock-based mutual exclusion for the cluster-level update-core action.
core/imageroot/usr/local/agent/actions/update-module/05pullimages Adds flock-based mutual exclusion for the per-module update-module pull step.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/imageroot/var/lib/nethserver/cluster/actions/update-core/50update_core Outdated
Comment thread core/imageroot/usr/local/agent/actions/update-module/05pullimages Outdated
Comment thread core/imageroot/usr/local/agent/actions/update-module/05pullimages Outdated
Software update procedures must be protected from
concurrent runs to avoid unexpected behaviors, like
sudden service restarts.

Use flock to ensure only one update action runs at
a time. The update-core and update-module/05pullimages
steps fail immediately if the lock is held, while
update-module/50run_scriptdir waits for it.

The lock is enforced by the kernel and released
implicitly when the process terminates, no matter
its exit code.
@DavidePrincipi DavidePrincipi merged commit 6762689 into main May 14, 2026
1 of 2 checks passed
@DavidePrincipi DavidePrincipi deleted the bug-7877 branch May 14, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent restart failed after concurrent update-core runs

3 participants