Skip to content

fix(onboard): fail fast in preflight when all dashboard ports are occupied (#3953)#3980

Open
nvshaxie wants to merge 1 commit into
mainfrom
fix/dashboard-port-preflight-3953
Open

fix(onboard): fail fast in preflight when all dashboard ports are occupied (#3953)#3980
nvshaxie wants to merge 1 commit into
mainfrom
fix/dashboard-port-preflight-3953

Conversation

@nvshaxie
Copy link
Copy Markdown
Contributor

@nvshaxie nvshaxie commented May 21, 2026

Summary

  • findAvailableDashboardPort already raises the right "All dashboard ports in range 18789-18799 are occupied …" error when every port is held, but it only fires in late onboarding (after preflight + gateway start + inference selection). Reporters expect the failure to be upfront.
  • Add a narrow preflightDashboardPortRangeAvailability() helper in src/lib/onboard/dashboard-port.ts that only checks host bindings (safe to run before OpenShell is reachable) and call it at the end of preflight(). Exits 1 with the same canonical message when every port in DASHBOARD_PORT_RANGE_START..END is bound.
  • Drop the unused findDashboardForwardOwner import/re-export from src/lib/onboard.ts to keep the entrypoint net-neutral per the onboard-entrypoint-budget workflow (the only remaining caller is dashboard-port.test.ts, which imports it directly from the module).

Bug reproduction (pre-fix)

```
$ for p in $(seq 18789 18799); do python3 -m http.server "$p" &>/dev/null & done
$ nemoclaw onboard --non-interactive --yes-i-accept-third-party-software --name overflow-test ...
[1/8] Preflight checks ← passes
[2/8] Starting OpenShell gateway ← runs anyway
[3/8] Configuring inference (NIM) ← runs anyway
Inference options menu … ← reaches the wizard
```

Behavior post-fix (same setup)

```
[1/8] Preflight checks
✓ Docker is running … ✓ Memory OK …
All dashboard ports in range 18789-18799 are occupied:
18789 → non-OpenShell host listener
18790 → non-OpenShell host listener
...
18799 → non-OpenShell host listener
Free a sandbox or use --control-ui-port with a port outside this range.
```

Exit 1. `[2/8] Starting OpenShell gateway` is never printed.

Why the helper is sound at this stage

  • findAvailableDashboardPort needs openshell forward list (so it can distinguish "this port is bound by an OpenShell forward that belongs to this sandbox" from "host listener"). The preflight helper deliberately skips that distinction and treats every bound port as a non-OpenShell listener.
  • That is correct here: if every port in the range is bound (either by external processes OR by other sandboxes), the new sandbox cannot use any of them anyway, and operator intervention is required either way (--control-ui-port <N> or freeing a sandbox).

Test plan

  • New `describe("preflightDashboardPortRangeAvailability ([NemoClaw][All Platforms][Onboard] Onboard does not fail fast when all dashboard ports 18789-18799 are occupied by external processes #3953)")` block in `src/lib/onboard/dashboard-port.test.ts` with 3 cases (every port bound → exit 1 with canonical message; 10/11 bound + 1 free → no exit; no ports bound → no exit). Uses injected stubs for `isPortBoundOnHost` and `process.exit`.
  • `npx vitest run src/lib/onboard/dashboard-port.test.ts` — 11/11 pass.
  • Manual on Ubuntu 24.04 x86_64: occupied 18789-18799 with `python3 -m http.server`, ran `node bin/nemoclaw.js onboard --non-interactive ...`. Exits at the end of `[1/8] Preflight checks` with the canonical message and the per-port list; `[2/8] Starting OpenShell gateway` never prints.
  • `src/lib/onboard.ts` diff stat: `+2/-2` (net 0), within the entrypoint budget.

Fixes #3953.

Signed-off-by: Shawn Xie shaxie@nvidia.com

Summary by CodeRabbit

  • New Features

    • Added an early-stage validation during onboarding to detect when all dashboard ports are exhausted, preventing users from progressing further without available ports.
  • Tests

    • Added comprehensive test coverage for the new port availability preflight check, including scenarios for fully occupied, partially available, and completely free port ranges.

Review Change Stack

…upied (#3953)

`findAvailableDashboardPort` already raises the right error
("All dashboard ports in range 18789-18799 are occupied …") when every
port is held, but it only runs late in onboarding — after preflight,
gateway start, and inference selection have already had side effects.
A user with all 11 dashboard ports held by external processes
(python3, etc.) saw onboard proceed all the way to the inference menu
before any port-related failure surfaced, contradicting the contract
in the docs / test plan.

Add a narrow `preflightDashboardPortRangeAvailability()` helper to
`src/lib/onboard/dashboard-port.ts` and call it from `preflight()`
right before it returns the GPU detection. The helper only checks
host bindings (it does not need OpenShell forward state, so it is
safe before gateway start). When every port in
`DASHBOARD_PORT_RANGE_START..END` is bound, it prints the same
canonical message and exits 1.

To keep `src/lib/onboard.ts` net-neutral per the
`onboard-entrypoint-budget` workflow, the unused
`findDashboardForwardOwner` import / re-export is dropped (no
callers outside the colocated unit test, which imports it directly
from `./onboard/dashboard-port`). Net delta on `onboard.ts`: +2/-2.

Test plan
- Added 3 unit tests in `src/lib/onboard/dashboard-port.test.ts`
  with injected `isPortBoundOnHost`/`process.exit` stubs:
  * exits 1 with the canonical message when every port in the range
    is bound; stderr lists each port → non-OpenShell host listener.
  * returns without exiting when 10 of 11 are bound and 1 is free.
  * returns without exiting when no port is bound.
- `npx vitest run src/lib/onboard/dashboard-port.test.ts` — 11/11 pass.
- Manual end-to-end on Ubuntu 24.04 x86_64:
  * Occupy 18789-18799 with `python3 -m http.server` placeholders.
  * `nemoclaw onboard --non-interactive ...` now exits inside
    `[1/8] Preflight checks` with
    `All dashboard ports in range 18789-18799 are occupied:`
    followed by one line per occupied port. `[2/8] Starting OpenShell
    gateway` and `[3/8] Configuring inference` are never printed.

Signed-off-by: Shawn Xie <shaxie@nvidia.com>
@nvshaxie nvshaxie requested review from cv and ericksoa May 21, 2026 07:09
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

This PR adds an early preflight check that fails fast when all dashboard ports (18789–18799) are occupied by external processes. The new preflightDashboardPortRangeAvailability function scans the entire range, detects full exhaustion, logs a detailed error message per port, and exits with code 1. If at least one port is free, it returns cleanly to allow onboarding to continue. The check is wired into the onboard preflight sequence before later inference steps proceed.

Changes

Dashboard Port Exhaustion Preflight

Layer / File(s) Summary
Dashboard port exhaustion check implementation and tests
src/lib/onboard/dashboard-port.ts, src/lib/onboard/dashboard-port.test.ts
preflightDashboardPortRangeAvailability iterates the dashboard port range with injectable port-bound checks, returns early if any port is free, and exits with non-zero code and per-port error listing if all are occupied. Tests verify full exhaustion (exit + error), partial availability (clean return), and zero-bound scenarios.
Onboard preflight integration
src/lib/onboard.ts
Imports the new preflight function, removes unused findDashboardForwardOwner from exports, and calls preflightDashboardPortRangeAvailability() early in preflight to catch port exhaustion before onboarding proceeds.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

NemoClaw CLI, fix

Suggested reviewers

  • ericksoa

Poem

🐰 Onboard now checks the port domain,
Before the inference steps begin their reign,
If all ports 18789 are taken tight,
We fail fast with an error, clear and bright!
No silent slides past—just honest sight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a fail-fast preflight check for when all dashboard ports are occupied, which is the core objective of this PR.
Linked Issues check ✅ Passed The PR fully addresses issue #3953 by implementing a preflight check that detects when all dashboard ports are occupied and fails fast with the required canonical error message and exit code 1.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing the preflight dashboard port check and removing an unused export, with no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/dashboard-port-preflight-3953

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: cloud-onboard-e2e, onboard-negative-paths-e2e
Optional E2E: double-onboard-e2e, dashboard-remote-bind-e2e

Dispatch hint: cloud-onboard-e2e,onboard-negative-paths-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • cloud-onboard-e2e (~45 min): Exercises the primary non-interactive cloud onboarding path through preflight, gateway start, sandbox creation, policies, and inference.local. This should verify the new dashboard-port preflight scan does not falsely fail on a clean runner.
  • onboard-negative-paths-e2e (~75 min): Covers onboarding negative/edge paths and live non-interactive onboarding, including friendly port-conflict failures without stack traces. This is the closest existing E2E guard for the new fail-fast preflight behavior.

Optional E2E

  • double-onboard-e2e (~90 min): Useful adjacent coverage for dashboard port reuse/allocation across repeated onboarding and multiple sandboxes, including checks that port 18789 conflicts do not regress during re-onboard.
  • dashboard-remote-bind-e2e (~90 min Brev branch-validation wrapper): Optional dashboard-forwarding regression coverage because the PR touches dashboard port utilities, though it focuses on bind address behavior rather than range exhaustion.

New E2E recommendations

  • dashboard-port-exhaustion-preflight (high): No existing E2E appears to bind every dashboard port in 18789-18799 and assert onboarding exits during preflight before starting the OpenShell gateway or creating a sandbox. The new unit tests cover the helper, but the PR changes the real onboarding phase ordering and side-effect boundary.
    • Suggested test: Add a dashboard port exhaustion case to test/e2e/test-onboard-negative-paths.sh, or create dashboard-port-exhaustion-preflight-e2e, that binds 18789-18799 with local listeners, runs non-interactive onboard, verifies the canonical 'All dashboard ports in range 18789-18799 are occupied' message and --control-ui-port hint, and asserts no gateway/sandbox side effects occurred.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: cloud-onboard-e2e,onboard-negative-paths-e2e

@github-actions
Copy link
Copy Markdown
Contributor

PR Review Advisor

Recommendation: blocked
Confidence: high
Analyzed HEAD: a916031de0731b043bd2ee81c2f2429f32650318
Findings: 1 blocker(s), 2 warning(s), 0 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: Review is based on the provided trusted metadata and repository read-only inspection; tests and package-manager commands were not executed.; CI and E2E Advisor were still pending for the provided head SHA, so final pass/fail status is unknown.; No review-thread state beyond the supplied GraphQL snapshot was available; CodeRabbit was still processing.; Linked issue had no comments in the provided context, so acceptance mapping covers the issue body only.

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: a916031de0731b043bd2ee81c2f2429f32650318
Recommendation: blocked
Confidence: high

The implementation adds useful early dashboard-port exhaustion detection, but it blocks the documented --control-ui-port escape hatch and current hard gates are not satisfied (CI pending, merge blocked, E2E Advisor not complete).

Gate status

  • CI: pending — 12 status context(s) appear pending; GraphQL shows E2E recommendation, PR review advisor, CodeQL, ShellCheck, checks, unit-vitest-linux, build-sandbox-images, build-sandbox-images-arm64, cli-parity, and macos-e2e not completed for a916031.
  • Mergeability: fail — mergeStateStatus=BLOCKED; reviewDecision=REVIEW_REQUIRED; mergeable_state=blocked.
  • Review threads: unknown — No unresolved review threads were reported, but CodeRabbit is still pending/in progress and review thread state is not fully authoritative.
  • Risky code tested: warning — Risky area detected: onboarding/host glue. Unit tests were added for the new helper, but runtime preflight/onboard behavior and --control-ui-port bypass semantics need E2E or integration coverage.

🔴 Blockers

  • Unconditional dashboard-range preflight breaks the documented --control-ui-port escape hatch (src/lib/onboard.ts:3844): preflightDashboardPortRangeAvailability() is called unconditionally after the existing explicit dashboard-port preflight. If all default ports 18789-18799 are bound, onboarding exits even when the user supplied --control-ui-port with a free port outside that range. This contradicts the issue/PR guidance that users can recover by using --control-ui-port outside the exhausted range, and also conflicts with createSandbox(), whose allocation path intentionally accepts a preferred port outside the default range.
    • Recommendation: Skip the range-exhaustion preflight when an explicit control-ui-port/dashboard port is configured, or have the helper accept the selected override port and return success when that override is free. Add a negative test covering all default ports bound plus a free explicit outside-range --control-ui-port path.
    • Evidence: onboard.ts:3670-3675 documents that when --control-ui-port is set, preflight should check that port instead of default auto-allocation behavior; onboard.ts:3844 then unconditionally scans the default range and exits. dashboard-port.ts:204-209 documents --control-ui-port as the intervention path, but the new helper exits before createSandbox() can use an outside preferredPort.

🟡 Warnings

  • Required E2E signal is missing for onboarding host-glue change: This PR changes preflight behavior in the onboarding path, affecting runtime sequencing before gateway startup and sandbox creation. The E2E Advisor check is still in progress and no E2E Advisor comment/recommendation was available, so required E2E coverage for this head SHA cannot be verified.
    • Recommendation: Wait for the E2E Advisor output and required E2E jobs to complete for a916031. At minimum, verify an onboarding negative-path E2E where all dashboard ports are occupied aborts before gateway startup, plus a positive override path using --control-ui-port outside 18789-18799.
    • Evidence: GraphQL statusCheckRollup shows E2E recommendation IN_PROGRESS and macos-e2e QUEUED; trusted context marks testDepth.verdict=e2e_required for src/lib/onboard.ts and src/lib/onboard/dashboard-port.ts.
  • High active overlap on onboard.ts increases drift risk (src/lib/onboard.ts:3844): The patch touches src/lib/onboard.ts, a large active monolith with many concurrent open PRs touching the same file. While the specific inserted call still applies cleanly to current code, this has elevated rebase/regression risk around preflight sequencing.

🔵 Suggestions

  • None.

Acceptance coverage

  • met — When all dashboard ports in the default 18789–18799 range are held by external processes, nemoclaw onboard continues instead of failing fast with a clear "all dashboard ports occupied" error.: dashboard-port.ts adds preflightDashboardPortRangeAvailability(), which scans 18789-18799 using isPortBoundOnHost and exits 1 if every port is bound; onboard.ts calls it at the end of preflight before step 2.
  • met — Component area: Onboard (preflight, dashboard port allocation).: Changed files are src/lib/onboard.ts and src/lib/onboard/dashboard-port.ts/test.ts, directly in onboard preflight/dashboard allocation.
  • met — 1. Start placeholder servers on all 11 ports:: Unit test uses allBound to simulate every port in the default range as bound; no real placeholder servers are used in tests.
  • partial — 2. Verify ports:: The helper uses isPortBoundOnHost, which relies on lsof/sudo lsof/Node bind probes in production. The added unit tests stub port-bound behavior but do not verify real ss/lsof output.
  • partial — 3. Run onboarding:: onboard.ts calls the helper during preflight, but the tests exercise the helper directly rather than running the full onboard command path.
  • partial — 4. Observe the preflight and subsequent steps.: The helper unit test verifies process.exit(1) and stderr content, but there is no automated assertion that [2/8] Starting OpenShell gateway is not printed or that later steps are not reached.
  • met — Preflight should detect that all candidate dashboard ports (18789–18799) are occupied by non-OpenShell processes.: preflightDashboardPortRangeAvailability() returns only if it finds a free port; otherwise it reports all scanned ports as non-OpenShell host listener.
  • partialnemoclaw onboard should abort before the inference step with a fatal error similar to:: The call is placed before gateway/inference setup and exits 1. However, the helper is direct-tested only; no full onboarding test proves abort ordering before inference.
  • met — All dashboard ports in range 18789-18799 are occupied:: dashboard-port.test.ts asserts stderr matches All dashboard ports in range 18789-18799 are occupied:.
  • partial — - 18789 → non-OpenShell host listener: The implementation emits 18789 → non-OpenShell host listener without the dash shown in the issue example. The text is otherwise semantically equivalent and tests assert this implemented format.
  • partial — - 18790 → non-OpenShell host listener: The helper emits every bound port line without dash prefixes. Tests explicitly check 18789 and 18799, but not every intermediate port line.
  • partial — - 18799 → non-OpenShell host listener: dashboard-port.test.ts asserts stderr contains 18799 → non-OpenShell host listener; formatting lacks the issue example dash.
  • partial — Exit code should be non-zero; no gateway or sandbox should be created.: The helper test verifies exit code 1. Placement before step 2 should prevent gateway/sandbox creation, but no integration/E2E test verifies absence of side effects.
  • partial — After freeing the ports and rerunning nemoclaw onboard --name overflow-test, onboarding should succeed and assign 18789 as the dashboard port.: Existing findAvailableDashboardPort tests verify 18789 is returned when free. No full rerun/onboard E2E test verifies success and assignment after freeing ports.
  • met — Users with many external processes on 18789–18799 may see onboarding proceed without a dashboard, with no clear guidance on port conflicts.: The new helper emits a clear error and guidance: Free a sandbox or use --control-ui-port <N> with a port outside this range.
  • partial — At preflight or early in onboard, scan the dashboard port range (default 18789–18799, or the configured range) for listeners.: The helper scans the default 18789-18799 range at preflight. It does not account for a configured/explicit outside-range dashboard port and can incorrectly exit despite a valid configured port.
  • partial — If all ports are occupied by non-OpenShell processes, fail fast with a clear fatal error that lists each port and process and suggests either:: The helper fails fast and lists each port as non-OpenShell host listener, but it does not list actual process names/PIDs.
  • met — Freeing some ports; or: Error message says Free a sandbox and tests assert the canonical message path; wording is narrower than freeing some ports but communicates freeing capacity.
  • partial — Setting NEMOCLAW_DASHBOARD_PORT / --control-ui-port to a different range.: The error suggests --control-ui-port <N>, but the new unconditional preflight would still fail if the default range remains exhausted and the user picks a free outside-range --control-ui-port. It also does not mention NEMOCLAW_DASHBOARD_PORT.

Security review

  • pass — 1. Secrets and Credentials: No hardcoded secrets, tokens, keys, credential stores, or secret logging were added. Changes are limited to dashboard port probing, preflight invocation, and tests.
  • warning — 2. Input Validation and Data Sanitization: No new untrusted string parsing is introduced, but the helper currently ignores configured dashboard-port input/override semantics and unconditionally validates the default range. This is primarily correctness rather than injection risk.
  • pass — 3. Authentication and Authorization: No endpoints, auth checks, token validation, or authorization logic are modified.
  • pass — 4. Dependencies and Third-Party Libraries: No dependencies or third-party libraries are added or changed.
  • warning — 5. Error Handling and Logging: The new error path intentionally exits with a clear message and does not leak secrets. However, it can emit misleading remediation guidance because --control-ui-port <N> is suggested but not honored when the default range remains fully occupied.
  • pass — 6. Cryptography and Data Protection: Not applicable — no cryptographic operations or data protection mechanisms are changed.
  • warning — 7. Configuration and Security Headers: The change affects runtime configuration behavior for dashboard port selection. It may block valid explicit configuration via --control-ui-port/NEMOCLAW_DASHBOARD_PORT when the default range is exhausted.
  • warning — 8. Security Testing: Unit tests cover all-bound, partially-bound, and none-bound helper cases. They do not cover the configured outside-range override path or full preflight side-effect boundary; E2E remains pending.
  • warning — 9. Holistic Security Posture: Fail-fast behavior improves operational clarity and reduces unnecessary gateway/sandbox side effects. The unconditional check can cause denial of valid onboarding configurations and should be corrected before merge.

Test / E2E status

  • Test depth: e2e_required — Runtime/sandbox/infrastructure paths need real execution coverage: src/lib/onboard.ts and src/lib/onboard/dashboard-port.ts alter preflight behavior before gateway start. Unit tests validate the helper in isolation but do not prove full CLI sequencing, no gateway creation, or override behavior.
  • E2E Advisor: missing (not found)
  • Required E2E jobs: E2E recommendation
  • Missing for analyzed SHA: E2E recommendation

✅ What looks good

  • The helper is narrow, dependency-injectable, and covered by focused unit tests for all-bound, partially-bound, and no-bound cases.
  • The call is placed before gateway startup, which is the right general lifecycle boundary for avoiding unnecessary side effects when the default range is truly exhausted.
  • The change removes an unused re-export from the large onboard entrypoint and the onboard-entrypoint-budget check passed.
  • No new dependencies, credentials handling, shell string execution, or network fetch paths were introduced.

Review completeness

  • Review is based on the provided trusted metadata and repository read-only inspection; tests and package-manager commands were not executed.
  • CI and E2E Advisor were still pending for the provided head SHA, so final pass/fail status is unknown.
  • No review-thread state beyond the supplied GraphQL snapshot was available; CodeRabbit was still processing.
  • Linked issue had no comments in the provided context, so acceptance mapping covers the issue body only.
  • Human maintainer review required: yes

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Line 3844: preflightDashboardPortRangeAvailability() is being called
unconditionally which causes failures even when the user provided an explicit
--control-ui-port; update the logic so the range-exhaustion check runs only when
no explicit control UI port override is present — detect the override (the CLI
option or config value that represents the explicit port, e.g. controlUiPort /
controlUiPortOverride / options.controlUiPort or the corresponding env var used
earlier in the file) and skip calling preflightDashboardPortRangeAvailability()
when that override is set, leaving the existing explicit-port path intact (the
explicit-port handling you preserved earlier) so the onboarding respects
--control-ui-port inputs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a75a1aed-7ee0-41b2-be45-486dcc29bbfe

📥 Commits

Reviewing files that changed from the base of the PR and between 18c7265 and a916031.

📒 Files selected for processing (3)
  • src/lib/onboard.ts
  • src/lib/onboard/dashboard-port.test.ts
  • src/lib/onboard/dashboard-port.ts

Comment thread src/lib/onboard.ts
}
}

preflightDashboardPortRangeAvailability(); // #3953 — fail fast on dashboard-port exhaustion before step 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Honor explicit --control-ui-port overrides before failing on range exhaustion.

Line 3844 unconditionally checks the default dashboard range, which breaks the explicit-port path you already preserve in Lines 3666-3674. If a user reruns with --control-ui-port 3000, onboarding can still exit just because 18789-18799 are full, even though that range is no longer relevant.

💡 Suggested fix
-  preflightDashboardPortRangeAvailability(); // `#3953` — fail fast on dashboard-port exhaustion before step 2
+  if (_preflightDashboardPort === null) {
+    preflightDashboardPortRangeAvailability(); // `#3953` — fail fast on dashboard-port exhaustion before step 2
+  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard.ts` at line 3844, preflightDashboardPortRangeAvailability()
is being called unconditionally which causes failures even when the user
provided an explicit --control-ui-port; update the logic so the range-exhaustion
check runs only when no explicit control UI port override is present — detect
the override (the CLI option or config value that represents the explicit port,
e.g. controlUiPort / controlUiPortOverride / options.controlUiPort or the
corresponding env var used earlier in the file) and skip calling
preflightDashboardPortRangeAvailability() when that override is set, leaving the
existing explicit-port path intact (the explicit-port handling you preserved
earlier) so the onboarding respects --control-ui-port inputs.

@wscurran wscurran added bug Something isn't working fix NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). labels May 21, 2026
@wscurran
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working fix NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[NemoClaw][All Platforms][Onboard] Onboard does not fail fast when all dashboard ports 18789-18799 are occupied by external processes

2 participants