fix(doctor): handle docker-driver gateway mode (resolver + skip k3s port check)#3941
fix(doctor): handle docker-driver gateway mode (resolver + skip k3s port check)#3941luisignaciomaiz-cmyk wants to merge 2 commits into
Conversation
📝 WalkthroughWalkthroughThis PR extends the sandbox doctor diagnostic logic to support docker-driver-specific containers by conditionally skipping port-mapping checks. The shields module now exports a container resolution helper, ChangesDocker-driver sandbox health checks
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/actions/sandbox/doctor.ts`:
- Around line 469-470: The current code falls back to the k3s gateway name by
using `|| \`openshell-cluster-${NEMOCLAW_GATEWAY_NAME}\`` when
`shields.resolveDockerDriverSandboxContainer(sandboxName)` returns null,
producing misleading hints; change the logic so you only call
`dockerInspectGateway(...)` if `_dockerDriverContainer` is non-null (i.e., check
`if (_dockerDriverContainer) {
checks.push(...dockerInspectGateway(_dockerDriverContainer, true)) }`) and do
not pass the k3s fallback name when the resolver returned null—this prevents
false positives and keeps `dockerInspectGateway` behavior tied to the actual
resolved container.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 54471912-b4b4-4c96-86f8-8e0f7623ca89
📒 Files selected for processing (3)
src/lib/actions/sandbox/doctor.test.tssrc/lib/actions/sandbox/doctor.tssrc/lib/shields/index.ts
| const _dockerDriverContainer = shields.resolveDockerDriverSandboxContainer(sandboxName); | ||
| checks.push(...dockerInspectGateway(_dockerDriverContainer || `openshell-cluster-${NEMOCLAW_GATEWAY_NAME}`, !!_dockerDriverContainer)); |
There was a problem hiding this comment.
Avoid falling back to k3s gateway name for docker-driver sandboxes when resolver returns null.
At Line 469 and Line 470, a docker-driver sandbox with a stopped/non-running container can resolve to null (resolver uses running containers), then incorrectly falls back to openshell-cluster-nemoclaw. That yields a misleading architecture-specific failure/hint.
💡 Suggested fix
- const _dockerDriverContainer = shields.resolveDockerDriverSandboxContainer(sandboxName);
- checks.push(...dockerInspectGateway(_dockerDriverContainer || `openshell-cluster-${NEMOCLAW_GATEWAY_NAME}`, !!_dockerDriverContainer));
+ const dockerDriverContainer = shields.resolveDockerDriverSandboxContainer(sandboxName);
+ const sandboxEntry = registry.getSandbox(sandboxName);
+ const isDockerDriver = sandboxEntry?.openshellDriver === "docker";
+ const gatewayTarget = isDockerDriver
+ ? (dockerDriverContainer || `openshell-${sandboxName}`)
+ : `openshell-cluster-${NEMOCLAW_GATEWAY_NAME}`;
+ checks.push(...dockerInspectGateway(gatewayTarget, isDockerDriver));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/actions/sandbox/doctor.ts` around lines 469 - 470, The current code
falls back to the k3s gateway name by using `||
\`openshell-cluster-${NEMOCLAW_GATEWAY_NAME}\`` when
`shields.resolveDockerDriverSandboxContainer(sandboxName)` returns null,
producing misleading hints; change the logic so you only call
`dockerInspectGateway(...)` if `_dockerDriverContainer` is non-null (i.e., check
`if (_dockerDriverContainer) {
checks.push(...dockerInspectGateway(_dockerDriverContainer, true)) }`) and do
not pass the k3s fallback name when the resolver returned null—this prevents
false positives and keeps `dockerInspectGateway` behavior tied to the actual
resolved container.
Summary
nemoclaw <sandbox> doctorreports two false failures in docker-driver gateway mode (openshellDriver === "docker"):dockerInspectGatewaylooks for a hard-coded container nameopenshell-cluster-${NEMOCLAW_GATEWAY_NAME}that only exists in kubernetes-driver mode.30051/tcppublished on the host port. In docker-driver mode the gateway is a HOST process (not inside the sandbox container), so no container port mapping exists.Both failures are diagnostic-only — the gateway itself works correctly in docker-driver mode, and
OpenShell status: connectedreportsokalongside the failed container check. The user is told the system is broken when it isn't.Root cause
dockerInspectGatewayinsrc/lib/actions/sandbox/doctor.tswas written for the legacy k3s-style cluster gateway architecture:openshell-cluster-<gateway-name>.:30051.NEMOCLAW_GATEWAY_PORT.In docker-driver mode (
openshellDriver === "docker"), the architecture is different:openshell-${sandbox-name}-${uuid}.host.openshell.internal.src/lib/shields/index.tsalready has aresolveDockerDriverSandboxContainer(sandboxName)helper that handles this correctly — it returnsnullunlessopenshellDriver === "docker", otherwise enumerates containers viadocker psand matches theopenshell-${sandbox-name}-prefix. It's used internally forkubectlExecArgvvs. docker-driver routing in shields code. But it wasn't exported, sodoctor.tscouldn't use it.Fix
Two surgical changes (2 files, ~7 insertions, 2 deletions):
src/lib/shields/index.ts— addresolveDockerDriverSandboxContainerto the named-export block at line ~1260. No function body changes; just exposing the existing implementation.src/lib/actions/sandbox/doctor.ts:dockerInspectGateway(containerName: string)→dockerInspectGateway(containerName: string, skipPortCheck: boolean = false). The new optional flag defaults tofalse, preserving all existing callers' behavior.if (skipPortCheck) return checks;before the port-mapping block. When set, the function emits the "Docker container" check and returns, skipping the k3s-only port check.dockerInspectGatewayis nowexported (annotated/** @internal — exported for unit tests; sole production consumer is runSandboxDoctor in this file. */) so the new test file can import it directly.Tests
Added
src/lib/actions/sandbox/doctor.test.tswith two unit tests covering both branches ofdockerInspectGateway:skipPortCheck=true(docker-driver path) — asserts only one host command issued (docker inspect), no port probe; result contains the "Docker container" check, no "Port mapping" check.skipPortCheck=false(default, K3s path) — asserts both inspect + port probes run as before; result contains both checks, port mapping reportsokwhen host mapping contains:GATEWAY_PORT.Verified locally on Ubuntu 24.04 aarch64 (DGX Spark) with
npx vitest run src/lib/actions/sandbox/— 37/37 tests pass across 7 files in 4ms.Note for reviewers about the test import path: the test imports
dockerInspectGatewayfrom../../../../dist/lib/actions/sandbox/doctor(compiled output) rather than./doctor(source). This follows the existing precedent insrc/lib/actions/sandbox/status.test.ts. The reason:doctor.ts's transitive deps use bare CJSrequire("./relative")for sibling modules, which vitest cannot resolve to.tson the fly without first runningnpm run build:cli. A long-term refactor converting theserequire()calls to ES imports would unblock direct.tstesting project-wide — happy to do that in a follow-up PR if desired.Two other diligence notes from the test work:
npm install --ignore-scriptsis required to get vitest installed because the package'spreparescript runsnpm install --omit=devafter the build, which would strip vitest right after installing it. The--ignore-scriptsflag avoids that. Worth a contributor-docs note.child_process.spawnSyncfor dist/-imported code required thecreateRequire(import.meta.url)("node:child_process")pattern.vi.spyOnagainst the ESM namespace fails with "Cannot redefine property: spawnSync" (the namespace is frozen). The CJS-cached module is mutable. Documented inline in the test for future contributors.Driver-mode matrix (verified behavior)
All non-docker modes preserve current behavior exactly. Docker-driver mode now gets accurate diagnostics.
Reproduction (before this PR)
On Ubuntu 24.04 aarch64 (DGX Spark) with NemoClaw
2026.4.24,openshellDriver === "docker", real sandbox containeropenshell-jarvis-19f22332-…:After this PR:
The remaining warning is
Shields: down, an environmental finding unrelated to this PR.Also verified end-to-end post-patch in the same environment:
nemoclaw <sandbox> statusoutput unchanged.Blast radius / risk
Minimal:
shields/index.tschange is purely additive (one new line in the named-export block; no function body changes). Cannot affect any existing call paths.doctor.tschange adds an optional parameter (default preserves all existing callers' behavior) and a single early-return guard. The only call site that passes the new flag is the one updated. The function is annotated@internal — exported for unit testsso theexportkeyword shouldn't be relied on by external consumers.Related findings (not in this PR)
While auditing, I also noticed
stopSandboxChannelsViaKubectlinsrc/lib/tunnel/services.ts:439uses the same hard-codedGATEWAY_CLUSTER_CONTAINERconstant. However, this is safe and doesn't need a patch — it's wrapped in a try-then-fallback pattern:stopSandboxChannels(line ~369) tries the kubectl-via-cluster-container path first, checks the result viareportStopResult, and falls through toopenshell sandbox execwith theGATEWAY_STOP_SCRIPTif the kubectl call fails or returns false. In docker-driver mode, the kubectl call fails gracefully (container doesn't exist), and the fallback runs the verified bash script inside the actual sandbox to stop the gateway. The doctor's failure was the anomaly precisely because doctor had no equivalent fallback; the gateway-stop architecture already self-protects. If a maintainer would prefer this PR also include a defensive guard insidestopSandboxChannelsViaKubectlitself (returnnullearly whenopenshellDriver !== "kubernetes"), I'm happy to add it — let me know.Also: I observed that OpenClaw's MCP runtime adapter (
openclaw/openclawrepo, separate from this) silently drops theheadersfield on remote MCP server configs when thetransportfield is not set explicitly (even thoughsseis the documented default). This is the SSE analogue of openclaw/openclaw#65590 (streamable-http). Will file a separate issue againstopenclaw/openclawreferencing #65590 — wrong repo for this PR.Local commit history on this branch
6c9201062— fix(doctor): handle docker-driver gateway mode (resolver + skip k3s port check)870844ef5— test(doctor): cover docker-driver skipPortCheck branch of dockerInspectGateway2 commits, 2 files net, ~123 insertions / 2 deletions. Ready to squash-merge if you prefer a single-commit history.
Summary by CodeRabbit
Tests
Refactor