fix: concurrent update-core and update-module runs#1176
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds protection against concurrent runs of update-core (cluster-level) and update-module (per-module) actions using a non-blocking fcntl.flock on a per-action lock file in AGENT_STATE_DIR. If the lock is already held the action fails immediately with status validation-failed and a structured error payload, relying on the kernel to release the lock on process exit.
Changes:
- Acquire an exclusive non-blocking flock at the start of the cluster
update-coreaction. - Acquire an exclusive non-blocking flock at the start of the agent
update-module/05pullimagesaction. - On contention, emit a
validation-failedstatus with anaction_already_runningerror and exit with code 3.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| core/imageroot/var/lib/nethserver/cluster/actions/update-core/50update_core | Adds flock-based mutual exclusion for the cluster-level update-core action. |
| core/imageroot/usr/local/agent/actions/update-module/05pullimages | Adds flock-based mutual exclusion for the per-module update-module pull step. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
gsanchietti
reviewed
May 14, 2026
Software update procedures must be protected from concurrent runs to avoid unexpected behaviors, like sudden service restarts. Use flock to ensure only one update action runs at a time. The update-core and update-module/05pullimages steps fail immediately if the lock is held, while update-module/50run_scriptdir waits for it. The lock is enforced by the kernel and released implicitly when the process terminates, no matter its exit code.
gsanchietti
approved these changes
May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Software update procedures must be protected from concurrent runs to avoid unexpected behaviors, like sudden service restarts.
Use fcntl flock to ensure only one action runs at a time. If the lock cannot be acquired, the action fails immediately with a validation-failed status.
The lock is enforced by the kernel and is implicitly released when the process terminates, no matter its exit code. The rendez-vous file is created only if it not exists (O_CREAT flag).
Refs NethServer/dev#7877