[Fixes: mosip/mosip-infra#1874] Added wg-onboard.sh for WireGuard onboarding automation#247
Conversation
This script automates the onboarding process for WireGuard in a new environment, handling peer allocation and configuration. Signed-off-by: Ivanmeneges <ivan.anil016@gmail.com>
|
Warning Review limit reached
More reviews will be available in 47 minutes and 1 second. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughA new Bash script ChangesWireGuard Self-Service Onboarding Script
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
.github/scripts/wg-onboard.sh (1)
43-43: ⚡ Quick winUse
SECRET_NAMESto drive publishing, or remove it.ShellCheck is right: the declared mapping is currently unused while Lines 236-238 duplicate the same names manually. Use the array for publishing to keep the mapping single-sourced.
♻️ Proposed refactor
log "Publishing environment secrets ..." -printf '%s' "$TF_CONF" | gh secret set TF_WG_CONFIG --env "$ENV_NAME" --repo "$REPO" --body - -printf '%s' "$WG0_CONF" | gh secret set CLUSTER_WIREGUARD_WG0 --env "$ENV_NAME" --repo "$REPO" --body - -printf '%s' "$WG1_CONF" | gh secret set CLUSTER_WIREGUARD_WG1 --env "$ENV_NAME" --repo "$REPO" --body - +CONFIG_VALUES=("$TF_CONF" "$WG0_CONF" "$WG1_CONF") +for i in "${!SECRET_NAMES[@]}"; do + printf '%s' "${CONFIG_VALUES[$i]}" | + gh secret set "${SECRET_NAMES[$i]}" --env "$ENV_NAME" --repo "$REPO" --body - +doneAlso applies to: 235-238
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/scripts/wg-onboard.sh at line 43, The SECRET_NAMES array is declared at the top of the script but is not being used, while the same secret names are manually hardcoded elsewhere in the publishing logic. Refactor the code where the secrets are being published (the section that currently manually specifies these names) to iterate over or reference the SECRET_NAMES array instead. This will keep the mapping single-sourced and eliminate the duplication that ShellCheck is flagging.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/scripts/wg-onboard.sh:
- Around line 128-136: The script does not honor the documented assigned.txt
format which uses `: available` to mark free peers. The parser in the TAKEN
array initialization (lines 128-136) incorrectly marks any non-empty suffix as
taken, including `: available`. Additionally, the writer at lines 225-227
appends rows without the documented `:` separator. Fix the parser to check if
the rest variable contains `: available` and only mark it as taken if it
contains something other than that marker. Fix the writer to include the proper
`:` separator when appending new entries, ensuring consistency with the
documented format throughout the script.
- Around line 119-120: The read-pick-append sequence for peer allocation has a
race condition where multiple onboarding runs can read the same assigned.txt
file, select the same free peers, and both append allocations. Consolidate the
read operation (currently at lines 119-120 where ASSIGNED_CONTENT is read via
ssh_cmd) together with the subsequent peer selection and append logic (which
occurs later in the script at lines 166-179 and 223-227) into a single remote
flock-guarded critical section. Move all three affected locations so that the
entire read-select-append sequence for peer allocation happens atomically within
one flock-protected block on the jumpserver, executed before any secrets are
published. This ensures that only one onboarding run can read and allocate peers
at a time, eliminating the race window.
- Around line 74-85: The parameter parsing loop in wg-onboard.sh (around the
case statement with --env, --host, --ssh-key, etc.) does not validate that flag
values exist before accessing `$2`. This causes issues when set -u is active or
when a flag is the last argument or followed by another flag. For each case
statement that reads a value from `$2` (--env, --host, --ssh-key, --repo,
--ticket, --wg-dir, --allowed-ips, --tf-peer, --wg0-peer, --wg1-peer), add a
guard to check that `$2` exists and is not empty before using it. If the guard
fails, either exit with an error message or skip that argument. This ensures the
script properly handles missing values and prevents flags from being consumed as
values.
- Around line 139-145: The find_assigned_peer function uses index(desc, env) for
substring matching, which causes false positive collisions where `dev` can match
`dev2` or `predev`. Replace the substring matching logic with explicit boundary
parsing that properly identifies and matches the environment label as a complete
token rather than a substring. This ensures exact environment matching while
preserving the secret suffix matching behavior in the same conditional check.
- Around line 244-247: The grep command uses a PCRE pattern with ENV_NAME
interpolated directly, which is unsafe because ENV_NAME might contain regex
metacharacters that delete wrong rows. Additionally, grep errors are silently
masked by the `|| true` clause before the data is moved, risking data loss.
Replace the regex-based filtering approach in this section with a literal TSV
field comparison using awk or a similar tool that compares the first
tab-delimited field exactly as a string, without interpreting special
characters. This ensures only rows with matching ENV_NAME values are removed and
any errors during filtering are properly surfaced before the file is replaced.
- Around line 182-187: The sanity check for the forced peer assignments
(TF_PEER, WG0_PEER, WG1_PEER) currently only verifies that the peers exist and
are distinct from each other, but does not verify that they are not already
assigned to other environments. Add an additional check in the loop that
validates each peer is not already allocated to an environment before allowing
the assignment. This check should examine the configuration to determine which
environment each peer is currently assigned to and reject any forced peer that
is already in use, preventing unintended reallocation of existing peer
assignments.
- Around line 231-233: The GitHub REST API endpoint in the gh api call requires
the environment name to be URL-encoded when included in the path parameter.
Since the environment name stored in ENV_NAME can contain special characters
like slashes (e.g., feature/foo), unencoded values will cause the API call to
fail with a 404 error. Before calling gh api for the
repos/${REPO}/environments/${ENV_NAME} endpoint, URL-encode the ENV_NAME
variable using a shell function or utility (such as using printf with %s or
leveraging jq's `@uri` encoding if available), and use the encoded value in the
endpoint path instead of the raw ENV_NAME variable.
- Around line 113-115: The ssh_cmd function currently disables SSH host key
verification by setting StrictHostKeyChecking=no, which creates a security
vulnerability allowing MITM or DNS poisoning attacks to inject malicious
WireGuard configs. Create a transient known-hosts file at script startup (before
calling ssh_cmd), then modify the ssh_cmd function to replace
StrictHostKeyChecking=no with StrictHostKeyChecking=accept-new or
StrictHostKeyChecking=yes and add a UserKnownHostsFile option pointing to the
provisioned known-hosts file. Follow the existing patterns used in
rke2-cluster/ansible/run-ansible.sh and Helmsman utilities for host key
verification as a reference implementation.
---
Nitpick comments:
In @.github/scripts/wg-onboard.sh:
- Line 43: The SECRET_NAMES array is declared at the top of the script but is
not being used, while the same secret names are manually hardcoded elsewhere in
the publishing logic. Refactor the code where the secrets are being published
(the section that currently manually specifies these names) to iterate over or
reference the SECRET_NAMES array instead. This will keep the mapping
single-sourced and eliminate the duplication that ShellCheck is flagging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f1800e43-77af-4198-b849-81aa66b8db48
📒 Files selected for processing (1)
.github/scripts/wg-onboard.sh
Updated the peer allocation process to support dynamic peer numbering and improved handling of assigned.txt formats. Enhanced logging and error handling for better clarity and debugging. Signed-off-by: Ivanmeneges <ivan.anil016@gmail.com>
Signed-off-by: Ivanmeneges <ivan.anil016@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/scripts/wg-onboard.sh (1)
37-37:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate
ALLOWED_IPSis a valid CIDR.The script accepts any value for
--allowed-ipswithout validating CIDR format. An invalid value like172.31.0.0/99ornot-a-cidrwill be written into the WireGuard configs (line 509) and cause WireGuard connection failures that only surface at runtime when users try to connect.🛡️ Proposed validation
Add validation after line 113:
done + +# Validate ALLOWED_IPS is a valid CIDR +if ! [[ "$ALLOWED_IPS" =~ ^([0-9]{1,3}\.){3}[0-9]{1,3}/[0-9]{1,2}$ ]]; then + die "Invalid CIDR for --allowed-ips: $ALLOWED_IPS (expected format: x.x.x.x/n)" +fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/scripts/wg-onboard.sh at line 37, Add validation for the ALLOWED_IPS variable to ensure it contains a valid CIDR notation before it gets used in the WireGuard configuration. After line 113 where ALLOWED_IPS is assigned (either from the default value at line 37 or from the --allowed-ips argument), implement a validation check that confirms the value follows proper CIDR format (valid IP address with valid prefix length like /16, not invalid values like /99 or non-CIDR strings). If the validation fails, log an error message and exit the script to prevent invalid CIDR values from being written to the WireGuard config at line 509.
♻️ Duplicate comments (1)
.github/scripts/wg-onboard.sh (1)
152-164:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftThe read-pick-append race condition remains unaddressed.
Two concurrent onboarding runs can read the same
assigned.txtsnapshot at line 154, independently select the same free peers vianext_free_peer(lines 471-482), and then both append allocations at lines 552-562. This causes peer conflicts and allocation corruption. The past review suggested moving the entire read-select-append sequence into one remoteflock-guarded critical section on the jumpserver.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/scripts/wg-onboard.sh around lines 152 - 164, The current implementation reads assigned.txt and selects free peers independently from multiple concurrent onboarding processes, creating a race condition where two runs can both read the same assigned.txt snapshot at line 154, both call next_free_peer (lines 471-482) with the same state, and both append allocations (lines 552-562), causing peer conflicts. To fix this, wrap the entire read-select-append sequence (reading ASSIGNED_CONTENT from assigned.txt, executing the next_free_peer logic, and appending new allocations) into a single atomic operation protected by a remote flock-guarded critical section on the jumpserver, ensuring that only one onboarding run can read and modify assigned.txt at a time.
🧹 Nitpick comments (1)
.github/scripts/wg-onboard.sh (1)
342-342: ⚡ Quick winOptimize IP collision check for large peer pools.
grep -qRrecursively searches all peer configs to find IP conflicts. With hundreds of peers this becomes slow. Usegrep -rl -m1to stop at the first match, or limit the search topeer*/peer*.conffiles only.♻️ Proposed optimization
- if ! grep -qR "${PROPOSED}" "$CONFIG_DIR"/peer*/*.conf 2>/dev/null; then + if ! grep -rl -m1 "${PROPOSED}" "$CONFIG_DIR"/peer*/peer*.conf 2>/dev/null | grep -q .; then CLIENT_IP="$PROPOSED" break fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/scripts/wg-onboard.sh at line 342, The grep -qR command used in the IP collision check recursively searches through all peer configs, which becomes slow with hundreds of peers. Optimize this search by either using grep -rl -m1 to stop searching after finding the first match, or by refining the file glob pattern from peer*/*.conf to peer*/peer*.conf to limit the search to only the actual configuration files instead of recursively searching all subdirectories.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/scripts/wg-onboard.sh:
- Around line 270-289: The `ensure_assigned_entry` function has a race condition
where `assigned_line_exists` checks if a peer entry exists, but another
concurrent process can append the same peer to the file before the current
process completes its append, creating duplicates. Fix this by making the
check-and-append operation atomic: consolidate the `assigned_line_exists` check
and the append operation into a single SSH command that performs both the
existence check and the write atomically, or use a file locking mechanism to
prevent concurrent modifications between the check and append steps.
- Around line 330-331: The `|| true` clause on line 330 in the docker ps command
masks detection failures by returning success even when no WireGuard container
is found, causing the variable C to be empty without surfacing the actual error.
Remove the `|| true` from the end of the docker ps pipeline that assigns to C,
allowing the command to fail naturally when grep doesn't find a matching
container. This will enable line 331's check to properly detect and report the
missing container instead of silently masking the root cause.
- Around line 228-233: The validation in the else block only checks that the
explicit --max-peers argument is a positive integer, but does not validate that
it is not less than the highest peer number already present on the jumpserver.
In the else block where the user provides an explicit --max-peers value, after
validating that it is a positive integer, you must also call highest_peer_number
to detect the currently highest peer and ensure that the provided MAX_PEERS
value is greater than or equal to that detected highest peer number. If
MAX_PEERS is less than the detected highest peer, call die with an appropriate
error message indicating that --max-peers cannot be less than the highest peer
already present.
---
Outside diff comments:
In @.github/scripts/wg-onboard.sh:
- Line 37: Add validation for the ALLOWED_IPS variable to ensure it contains a
valid CIDR notation before it gets used in the WireGuard configuration. After
line 113 where ALLOWED_IPS is assigned (either from the default value at line 37
or from the --allowed-ips argument), implement a validation check that confirms
the value follows proper CIDR format (valid IP address with valid prefix length
like /16, not invalid values like /99 or non-CIDR strings). If the validation
fails, log an error message and exit the script to prevent invalid CIDR values
from being written to the WireGuard config at line 509.
---
Duplicate comments:
In @.github/scripts/wg-onboard.sh:
- Around line 152-164: The current implementation reads assigned.txt and selects
free peers independently from multiple concurrent onboarding processes, creating
a race condition where two runs can both read the same assigned.txt snapshot at
line 154, both call next_free_peer (lines 471-482) with the same state, and both
append allocations (lines 552-562), causing peer conflicts. To fix this, wrap
the entire read-select-append sequence (reading ASSIGNED_CONTENT from
assigned.txt, executing the next_free_peer logic, and appending new allocations)
into a single atomic operation protected by a remote flock-guarded critical
section on the jumpserver, ensuring that only one onboarding run can read and
modify assigned.txt at a time.
---
Nitpick comments:
In @.github/scripts/wg-onboard.sh:
- Line 342: The grep -qR command used in the IP collision check recursively
searches through all peer configs, which becomes slow with hundreds of peers.
Optimize this search by either using grep -rl -m1 to stop searching after
finding the first match, or by refining the file glob pattern from peer*/*.conf
to peer*/peer*.conf to limit the search to only the actual configuration files
instead of recursively searching all subdirectories.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: eed3d46e-e13e-4ab8-a4f3-0c7b7883337d
📒 Files selected for processing (1)
.github/scripts/wg-onboard.sh
Use flock-guarded atomic peer allocation on the jumpserver, validate --max-peers against the highest existing peer, harden SSH and secret publishing, and remove the obsolete ensure_assigned_entry race path. Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Ivanmeneges <ivan.anil016@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/scripts/wg-onboard.sh:
- Around line 257-276: The `record_assignment` function uses `$line` directly in
sed replacement text, which will fail if the variable contains special sed
characters like `&` (which represents the matched pattern) or backslashes (which
escape characters). To fix this, escape these special characters in the `$line`
variable before using it in any sed replacement command by replacing `&` with
`\&` and `\` with `\\`. Apply this escaping consistently across all sed
replacement operations in the function, including both the colon format branch
(where line 263 and line 268 use sed replacements) and the non-colon format
branch (where line 272 uses sed replacement).
- Around line 350-352: The wg set command with preshared-key expects a file path
argument, not the key value itself. At the docker exec call where preshared-key
"$PSK" is passed, modify the command to write the PSK value to a temporary file
inside the container first, then pass that file path to the preshared-key option
instead of passing the variable value directly. This ensures the wg set command
can properly read the pre-shared key from a file as it expects, and clean up the
temporary file after the command completes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 393b85fc-8f6a-4992-8ea4-ec18f451be3a
📒 Files selected for processing (1)
.github/scripts/wg-onboard.sh
Escape & and backslashes before sed replacements in record_assignment, and pass the PSK via a temp file inside the container for wg set. Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Ivanmeneges <ivan.anil016@gmail.com>
Updated lock file path and increased MAX_PEERS limit. Signed-off-by: Ivanmeneges <ivan.anil016@gmail.com>
Refactor peer allocation and assignment logic in wg-onboard.sh. Update conditions for MAX_PEERS and improve assignment recording. Signed-off-by: Ivanmeneges <ivan.anil016@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/scripts/wg-onboard.sh (1)
119-120:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCIDR regex accepts out-of-range values.
The regex validates general format but permits invalid octets (e.g.,
999.999.999.999) and prefix lengths (e.g.,/99). An invalid CIDR would cause downstream WireGuard failures.🛡️ Proposed stricter validation
-[[ "$ALLOWED_IPS" =~ ^[0-9]+(\.[0-9]+){3}/[0-9]+$ ]] \ - || die "--allowed-ips must be an IPv4 CIDR (e.g. 172.31.0.0/16), got: $ALLOWED_IPS" +validate_cidr() { + local cidr="$1" ip prefix IFS='.' octets + [[ "$cidr" =~ ^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)/([0-9]+)$ ]] || return 1 + ip="${BASH_REMATCH[1]}" prefix="${BASH_REMATCH[2]}" + read -ra octets <<<"${ip//./ }" + for o in "${octets[@]}"; do + (( o <= 255 )) || return 1 + done + (( prefix <= 32 )) || return 1 +} +validate_cidr "$ALLOWED_IPS" \ + || die "--allowed-ips must be a valid IPv4 CIDR (e.g. 172.31.0.0/16), got: $ALLOWED_IPS"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/scripts/wg-onboard.sh around lines 119 - 120, The regex validation for ALLOWED_IPS in the die statement is too permissive and accepts invalid IPv4 octets (above 255) and invalid CIDR prefix lengths (above 32). Replace the current regex pattern with a stricter validation that ensures each octet is between 0 and 255 (you can use a pattern like (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) for each octet) and the prefix length after the slash is between 0 and 32, to prevent invalid CIDR notation from reaching downstream WireGuard configuration.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/scripts/wg-onboard.sh:
- Around line 677-682: The trap command setting up cleanup for the temporary
file created by mktemp in this section overwrites the existing EXIT trap that
was set earlier for SSH_KNOWN_HOSTS cleanup, causing that file to leak when the
script exits normally. Preserve the existing trap by capturing it before setting
the new one, then combine both cleanup commands in a single trap statement so
that when the script exits, both the temporary file from mktemp and the
SSH_KNOWN_HOSTS temporary file are properly removed. Make sure to avoid using
trap - EXIT which clears all traps, and instead properly restore or remove only
the trap added in this section.
---
Outside diff comments:
In @.github/scripts/wg-onboard.sh:
- Around line 119-120: The regex validation for ALLOWED_IPS in the die statement
is too permissive and accepts invalid IPv4 octets (above 255) and invalid CIDR
prefix lengths (above 32). Replace the current regex pattern with a stricter
validation that ensures each octet is between 0 and 255 (you can use a pattern
like (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) for each octet) and the prefix
length after the slash is between 0 and 32, to prevent invalid CIDR notation
from reaching downstream WireGuard configuration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 295c9806-cc65-44b8-857c-88725766c723
📒 Files selected for processing (1)
.github/scripts/wg-onboard.sh
Refactor cleanup logic for temporary SSH known hosts and TMP files. Signed-off-by: Ivanmeneges <ivan.anil016@gmail.com>
This script automates the onboarding process for WireGuard in a new environment, handling peer allocation and configuration.
Summary by CodeRabbit
AllowedIPsand removing anyDNSline).