Skip to content

fix(channels): teardown session preset + sandbox auth files on remove#4001

Open
laitingsheng wants to merge 3 commits into
mainfrom
fix/channels-remove-full-teardown-3998
Open

fix(channels): teardown session preset + sandbox auth files on remove#4001
laitingsheng wants to merge 3 commits into
mainfrom
fix/channels-remove-full-teardown-3998

Conversation

@laitingsheng
Copy link
Copy Markdown
Contributor

@laitingsheng laitingsheng commented May 21, 2026

Summary

nemoclaw <name> channels remove <channel> left two pieces of state behind, so the channel resurrected itself on the next rebuild — policy-list re-listed the channel's preset and the in-sandbox bot library (Baileys for WhatsApp) auto-reconnected from saved auth files. The operator saw channels status empty but DMs still got a reply.

This change strips the channel from session.policyPresets so onboard --resume's preset reconciliation cannot re-apply the preset, and rm -rfs the channel's durable state inside the sandbox before queuing the rebuild so the state_dirs backup snapshots an empty dir.

Related Issue

Fixes #3998

Changes

  • src/lib/actions/sandbox/policy-channel.ts:
    • Add dropChannelFromSessionPolicyPresets() — strips the channel from session.policyPresets so policy-selection.ts:184 does not target it on resume.
    • Add clearSandboxChannelDurableState() — agent-aware sandbox-exec rm -rf of the channel's durable state dir. Hermes (platforms/ umbrella) gets ${configDir}/platforms/<channel>/; OpenClaw (channel name declared as its own state_dir) gets ${configDir}/<channel>/. Path is validated to live under /sandbox/.<x>/ with no .. segments before interpolation into the rm command.
    • Wire both into removeSandboxChannel after removeChannelPresetIfPresent and before promptAndRebuild so the rebuild's backup excludes the auth files.
  • test/channels-remove-full-teardown.test.ts: regression test — covers OpenClaw + Hermes whatsapp removal (session.policyPresets stripped, sandbox rm -rf issued for the right path, ordering invariant: clear precedes rebuild) and a token-based channel (telegram) confirming the session-preset strip applies symmetrically.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Channel removal now reliably clears durable in-sandbox channel state for QR-paired channels, refuses rebuild if cleanup cannot reach the sandbox, and always removes the channel from session presets to avoid re-applying on rebuild. Token-based channels get a best-effort in-sandbox cleanup.
  • Tests

    • Added regression tests covering full teardown, SSH fallback, abort-on-cleanup-failure, and token-channel removal flows.
  • Documentation

    • Updated docs and command reference to describe cleanup, rebuild behavior, and failure semantics for channel removal.

Review Change Stack

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

Adds in-sandbox durable per-channel cleanup (exec + SSH fallback), removes channel names from session.policyPresets, updates remove ordering to abort on QR cleanup failure, adds regression tests for WhatsApp/Telegram, and documents the new behavior.

Changes

Channel Removal Full Teardown

Layer / File(s) Summary
Sandbox cleanup imports
src/lib/actions/sandbox/policy-channel.ts
Add shellQuote and executeSandboxExecCommand/executeSandboxCommand imports to support safe in-sandbox command execution.
Channel state cleanup utilities
src/lib/actions/sandbox/policy-channel.ts
Implement helpers to compute agent-specific durable session dirs, validate /sandbox/.-scoped safety, execute rm -rf inside the sandbox with exec-first and SSH-fallback behavior, and remove the channel from session.policyPresets.
Integrate cleanup into removeSandboxChannel
src/lib/actions/sandbox/policy-channel.ts
For QR-paired channels, clear in-sandbox durable state before proceeding and abort on failure; after gateway/registry removal, always drop the channel from session.policyPresets and perform best-effort durable clear for token channels.
Documentation updates
docs/manage-sandboxes/messaging-channels.mdx, docs/reference/commands.mdx
Document destructive in-sandbox cleanup for QR channels (OpenClaw vs Hermes paths), SSH fallback, removal from session.policyPresets, and rebuild being blocked if the sandbox is unreachable.
Test harness and mocking infrastructure
test/channels-remove-full-teardown.test.ts
Add a generated-subprocess test harness that stubs dist/lib modules, filters messaging-channel env vars, captures sandbox exec/SSH invocations, and records console markers to assert ordering and side effects.
WhatsApp channel removal regression tests
test/channels-remove-full-teardown.test.ts
Verify QR-paired WhatsApp removal performs in-sandbox rm -rf for the agent-specific path, calls removePreset('whatsapp') once, removes whatsapp from session.policyPresets, and logs cleared in-sandbox state before promptAndRebuild; includes SSH fallback case.
QR cleanup failure-mode test
test/channels-remove-full-teardown.test.ts
Simulate both exec and SSH cleanup failures: assert non-zero exit, no promptAndRebuild, no preset removal, no registry updates, and unchanged session.policyPresets.
Telegram channel removal regression test
test/channels-remove-full-teardown.test.ts
Verify token-based Telegram removal drops telegram from session.policyPresets, preserves other presets, and empties sandbox messagingChannels in the registry update.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as NemoClaw CLI
  participant RemoveFlow as removeSandboxChannel
  participant Exec as openshell sandbox exec
  participant SSH as openshell sandbox ssh
  participant Registry as Sandboxes Registry
  participant Session as onboardSession
  participant Rebuilder as promptAndRebuild

  CLI->>RemoveFlow: request channels remove <channel>
  RemoveFlow->>Exec: run "rm -rf <agent-session-dir> && echo 'Cleared in-sandbox'"
  alt exec success
    Exec-->>RemoveFlow: success sentinel
  else exec null / unreachable
    RemoveFlow->>SSH: perform SSH rm -rf attempt
    SSH-->>RemoveFlow: success / failure
  end
  alt success (QR)
    RemoveFlow->>Registry: remove channel from messagingChannels
    Registry-->>RemoveFlow: ack
    RemoveFlow->>Session: drop channel from session.policyPresets
    Session-->>RemoveFlow: ack
    RemoveFlow->>Rebuilder: queue promptAndRebuild
  else failure (QR)
    RemoveFlow-->>CLI: abort with non-zero exit (do not rebuild)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#3672: Overlaps on src/lib/actions/sandbox/policy-channel.ts preset un-application vs durable cleanup approach.
  • NVIDIA/NemoClaw#3696: E2E tests exercising add/remove lifecycle for messaging channels that validate preset and durable-state behavior.
  • NVIDIA/NemoClaw#3392: Prior QR onboarding/removal flow changes in the same policy-channel.ts paths.

Suggested labels

NemoClaw CLI, enhancement: messaging

Suggested reviewers

  • ericksoa
  • cv
  • jyaunches

Poem

🐰 I nudge the sandbox, gentle and neat,
I clear the crumbs where sessions meet.
WhatsApp hops away, Telegram too,
No stale keys to wander through.
Fresh rebuilds now start clean and sweet.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing channel teardown by removing the session preset and clearing sandbox auth files during removal.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from issue #3998: removes channel from session.policyPresets, clears in-sandbox durable state with path validation, handles multiple platforms/layouts, provides SSH fallback transport, and preserves correct teardown ordering before rebuild.
Out of Scope Changes check ✅ Passed All changes directly address issue #3998 objectives: policy-channel.ts implements cleanup logic, test file provides regression coverage, and documentation updates describe the new behavior. No unrelated or out-of-scope modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/channels-remove-full-teardown-3998

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@laitingsheng laitingsheng added fix Integration: WhatsApp Use this label to identify WhatsApp communication integration issues with NemoClaw. labels May 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

E2E Advisor Recommendation

Required E2E: channels-stop-start-e2e, channels-add-remove-e2e
Optional E2E: messaging-providers-e2e

Dispatch hint: channels-stop-start-e2e,channels-add-remove-e2e

Auto-dispatched E2E: channels-stop-start-e2e, channels-add-remove-e2e via nightly-e2e.yaml at 9ce04c1c15f4cab74464923c47396e16583f739bnightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • channels-stop-start-e2e (high (~120 min timeout; two sandbox scenarios)): Highest-signal existing E2E for this PR: exercises channels remove on a live sandbox for OpenClaw and Hermes across telegram/discord/wechat/slack/whatsapp, verifies provider deletion, registry cleanup, policy preset removal, rebuild queuing, and post-remove rebuild state.
  • channels-add-remove-e2e (medium-high (~75 min timeout)): Focused add/remove lifecycle regression for token-backed channels and channel policy cleanup. Useful merge-blocking guard because removeSandboxChannel now also mutates session.policyPresets and best-effort durable state cleanup for token-backed channels.

Optional E2E

  • messaging-providers-e2e (high (~75 min timeout)): Adjacent confidence for WhatsApp QR-only channel add/rebuild, provider absence, config baking, no host credential leakage, and policy survival. It does not directly test channels remove, so it is optional rather than merge-blocking for this diff.

New E2E recommendations

  • QR-paired channel remove durable-state teardown (high): Existing E2E covers live channels remove whatsapp success broadly, but does not assert that /sandbox/.openclaw/whatsapp/ or /sandbox/.hermes/platforms/whatsapp/ is deleted before rebuild, nor that cleanup failure aborts before registry/policy mutation. The new Vitest regression covers mocks only.
    • Suggested test: Add a focused E2E that seeds fake WhatsApp durable state directories in OpenClaw and Hermes sandboxes, runs nemoclaw <sandbox> channels remove whatsapp, asserts the directories are gone before/after rebuild, asserts the whatsapp policy preset is not re-applied on resume, and includes a negative stopped-sandbox case that exits non-zero without mutating registry/policy.
  • sandbox exec fallback transport (medium): The implementation falls back from openshell sandbox exec to SSH when the success sentinel is missing, but current E2E jobs do not force the exec-wrapper failure path.
    • Suggested test: Add a hermetic or live E2E fixture that makes openshell sandbox exec fail while SSH remains reachable, then verifies channels remove whatsapp still clears durable state and proceeds to rebuild.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: channels-stop-start-e2e,channels-add-remove-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

PR Review Advisor

Recommendation: blocked
Confidence: medium
Analyzed HEAD: 9ce04c1c15f4cab74464923c47396e16583f739b
Findings: 3 blocker(s), 3 warning(s), 0 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: Review was read-only; no tests, package-manager commands, sandbox commands, PR scripts, or E2E jobs were executed by this advisor.; Analysis relied on the provided deterministic GitHub/CI context and the supplied diff, with targeted read-only inspection of changed files.; CI and E2E results were still pending for the specified head SHA 9ce04c1.; The E2E Advisor comment found in PR discussion auto-dispatched required jobs for d371b33; no required E2E pass evidence was available for the current head SHA.; A selective E2E result for prior SHA d371b33 reported channels-stop-start-e2e failure, but that result is not for the requested head SHA 9ce04c1.; No live WhatsApp pairing/removal/rebuild/DM evidence was available to verify the final user-visible acceptance condition.; Open PR overlap data indicates docs/reference/commands.mdx is concurrently touched by multiple active PRs, so final drift should be rechecked before merge.

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: 9ce04c1c15f4cab74464923c47396e16583f739b
Recommendation: blocked
Confidence: medium

The PR targets the live channel-removal bug path and adds useful teardown tests/docs, but merge is blocked by pending CI/blocked merge state, missing required E2E evidence for head 9ce04c1, and the repository monolith-growth gate.

Gate status

  • CI: pending — GraphQL statusCheckRollup for 9ce04c1 shows pending/in-progress/queued contexts including cli-parity, E2E recommendation, wsl-e2e, macos-e2e, PR review advisor, CodeQL, unit-vitest-linux, checks, ShellCheck SARIF, build-sandbox-images, build-sandbox-images-arm64, and CodeRabbit. Deterministic context reports 13 status context(s) appear pending.
  • Mergeability: fail — GitHub reports mergeStateStatus=BLOCKED for PR fix(channels): teardown session preset + sandbox auth files on remove #4001 at head 9ce04c1.
  • Review threads: unknown — No review thread state was available; GraphQL reviewThreads.nodes is empty but reviewDecision is REVIEW_REQUIRED and CodeRabbit is pending.
  • Risky code tested: fail — The changed source modifies sandbox runtime/channel teardown behavior in src/lib/actions/sandbox/policy-channel.ts. Unit regression tests were added, but trusted testDepth says e2e_required and the required E2E jobs have not passed for this head SHA.

🔴 Blockers

  • Required checks are not complete for the reviewed head SHA: The PR is not currently mergeable because required CI/status contexts are still pending, queued, or in progress, and GitHub reports the merge state as BLOCKED.
    • Recommendation: Wait for all required CI, security, platform, and review checks to complete successfully for 9ce04c1, then re-run review against that exact SHA.
    • Evidence: GraphQL statusCheckRollup for head 9ce04c1 includes pending/in-progress contexts; mergeStateStatus=BLOCKED.
  • Sandbox-runtime teardown needs required E2E evidence for this head SHA (src/lib/actions/sandbox/policy-channel.ts:764): The change depends on real OpenShell sandbox exec behavior, SSH fallback, OpenClaw/Hermes state_dirs backup timing, rebuild ordering, and QR-paired WhatsApp durable auth deletion. The added tests mock sandbox-exec and rebuild behavior, so they cannot prove the real sandbox state is deleted before backup or that WhatsApp stops responding after rebuild.
    • Recommendation: Confirm the E2E Advisor required jobs pass for 9ce04c1: channels-stop-start-e2e and channels-add-remove-e2e. Consider adding the advisor-suggested focused durable-state teardown E2E that seeds fake auth/session files and verifies they are absent before/after rebuild and that unreachable sandbox cleanup aborts.
    • Evidence: E2E Advisor required jobs: channels-stop-start-e2e and channels-add-remove-e2e. The visible selective E2E result for d371b33 had channels-add-remove-e2e success and channels-stop-start-e2e failure; no pass evidence is available for current head 9ce04c1.
  • Monolith growth exceeds repository threshold (src/lib/actions/sandbox/policy-channel.ts:1): The already-large policy-channel action module grew further. Deterministic monolith analysis marks growth of 20 or more lines in this file as a blocker.
    • Recommendation: Extract the new durable-state teardown/session-preset helpers into a focused helper module or offset the growth with equivalent simplification before merge.
    • Evidence: monolithDeltas reports src/lib/actions/sandbox/policy-channel.ts baseLines=947 headLines=1034 delta=87 severity=blocker.

🟡 Warnings

  • Destructive sandbox cleanup path is guarded but lacks negative-path evidence (src/lib/actions/sandbox/policy-channel.ts:746): The new cleanup constructs and executes a destructive rm -rf command inside the sandbox. The path is derived from trusted agent manifests and canonical channel names, constrained under /sandbox/., rejects '..', and is shell-quoted. However, if future manifest changes produce only unsafe derived paths, the current filter can reduce the path list to empty and treat cleanup as unnecessary, and there is no unit coverage for unsafe derived path rejection.
    • Recommendation: Add negative tests for unsafe derived paths and consider failing closed for QR-paired cleanup when a configured state path exists but is rejected by validation.
    • Evidence: clearSandboxChannelDurableState filters getSandboxChannelStatePaths(...).filter(isSafeChannelStatePath) and returns true when paths.length === 0; tests cover OpenClaw/Hermes happy paths and cleanup failure but not unsafe derived paths.
  • Tests cover mocked teardown but not real backup semantics (test/channels-remove-full-teardown.test.ts:1): The new tests are useful and cover OpenClaw/Hermes path selection, session.policyPresets stripping, SSH fallback, cleanup-failure abort, and cleanup-before-rebuild ordering. They still rely on mocked dist modules and subprocess stubs, so they cannot verify real sandbox filesystem deletion, state_dirs backup exclusion, or WhatsApp post-removal non-response.
    • Recommendation: Keep the unit tests, but gate merge on required E2E for this head SHA and add the E2E Advisor's sentinel-file teardown scenario if feasible.
    • Evidence: test/channels-remove-full-teardown.test.ts stubs executeSandboxExecCommand, executeSandboxCommand, registry, policies, and prompt/rebuild behavior; E2E Advisor explicitly recommends a QR-paired messaging channel teardown E2E.
  • Command reference has active open-PR overlap (docs/reference/commands.mdx:653): The PR touches docs/reference/commands.mdx, which is concurrently touched by multiple active open PRs. This is not inherently wrong, but it increases rebase/drift risk for command-reference edits.

🔵 Suggestions

  • None.

Acceptance coverage

  • partial — After running nemoclaw test333 channels remove whatsapp and completing the rebuild, the WhatsApp channel continues to function — the AI agent still receives and responds to incoming WhatsApp DMs.: The diff attempts to prevent this by clearing QR-paired durable state before registry/policy/session mutation and rebuild. No completed live WhatsApp pairing/remove/rebuild/DM E2E evidence is available for head 9ce04c1.
  • partial — Three cleanup steps are missing: policy preset removal, Baileys session teardown, and durable state directory cleanup.: Policy preset removal is preserved via removeChannelPresetIfPresent; session.policyPresets cleanup is added via dropChannelFromSessionPolicyPresets; durable state cleanup is added via clearSandboxChannelDurableState using in-sandbox rm -rf. Baileys teardown is indirect through deleting durable auth/session files and still needs real E2E proof.
  • unknown — 1. nemoclaw onboard --name test333 (complete onboard with NVIDIA Cloud API): No completed current-head E2E evidence for this setup step was provided.
  • unknown — 2. nemoclaw test333 channels add whatsapp (add WhatsApp channel, answer y to rebuild): No completed WhatsApp add/rebuild E2E evidence for head 9ce04c1 was provided.
  • unknown — 3. nemoclaw test333 connectopenclaw channels login --channel whatsapp (complete QR pairing, confirm channels status shows connected): The unit tests do not perform real QR pairing or verify Baileys session creation; no current-head E2E output is available.
  • unknown — 4. Exit sandbox: No real sandbox session flow was exercised in the provided evidence.
  • partial — 5. nemoclaw test333 channels remove whatsapp (answer y to rebuild): removeSandboxChannel now clears QR state before registry/policy/session mutation, removes gateway/registry state, removes the applied policy preset, strips session.policyPresets, and then prompts/rebuilds. Tests invoke removeSandboxChannel under mocks and assert cleanup precedes rebuild.
  • met — 6. cat ~/.nemoclaw/sandboxes.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(d['sandboxes']['test333']['messagingChannels'])": The removal path still calls applyChannelRemoveToGatewayAndRegistry. The new telegram test asserts registry update writes messagingChannels=[]; the issue also stated this path was already clearing correctly for WhatsApp.
  • partial — 7. nemoclaw test333 policy-list: removeChannelPresetIfPresent removes the applied channel preset and dropChannelFromSessionPolicyPresets prevents resume from reapplying it. Tests assert removePreset('whatsapp') and session.policyPresets no longer contains whatsapp, but no real policy-list output is available for this head.
  • unknown — 8. nemoclaw test333 connectopenclaw channels status: No unit or current-head E2E evidence invokes real openclaw channels status after remove/rebuild.
  • unknown — 9. Send a WhatsApp DM to the paired phone number: No live WhatsApp DM non-response evidence is available. This remains the final user-visible acceptance condition requiring E2E or manual verification.
  • met — Step 6: messagingChannels = [] (channel unregistered): The existing registry removal path is preserved, and tests verify a registry update to an empty messagingChannels array for a token channel.
  • partial — Step 7: whatsapp preset absent from policy-list (symmetric with channels add which auto-applies it): The diff calls removeChannelPresetIfPresent and dropChannelFromSessionPolicyPresets; mocked tests assert removePreset('whatsapp') and session preset stripping. Actual policy-list output after rebuild is not evidenced for 9ce04c1.
  • unknown — Step 8: No WhatsApp channel shown in openclaw channels status: No current-head E2E output for openclaw channels status after removal was provided.
  • unknown — Step 9: No AI response — channel fully deactivated after removal: The code deletes /sandbox/.openclaw/whatsapp or /sandbox/.hermes/platforms/whatsapp before rebuild for QR-paired channels, but live WhatsApp non-response is not proven.
  • met — Step 6: messagingChannels = [] — registry cleared correctly: The PR preserves applyChannelRemoveToGatewayAndRegistry and tests verify registry removal behavior for telegram.
  • partial — Step 7: whatsapp preset STILL ACTIVE in policy-list (● whatsapp shown): The PR addresses this symptom by removing the applied preset and stripping session.policyPresets. Unit tests cover intended state mutation, but real post-rebuild policy-list evidence is pending.
  • unknown — Step 8: No WhatsApp channel listed in openclaw channels status (misleading — appears removed): No direct post-fix openclaw channels status evidence is available.
  • unknown — Step 9: AI agent STILL REPLIES to WhatsApp DMs — channel is fully functional despite removal: The intended fix deletes durable auth state before rebuild and aborts QR removal if cleanup fails, but no live WhatsApp DM verification is available.
  • unknown — Post-rebuild log from nemoclaw test333 logs shows Baileys reconnecting:: No post-fix logs from current head were provided to show Baileys no longer reconnects.

Security review

  • pass — 1. Secrets and Credentials: No hardcoded secrets, API keys, passwords, or real credentials were added. The change deletes channel auth/session state inside the sandbox; test fixtures use stub values and isolate ambient messaging env vars.
  • warning — 2. Input Validation and Data Sanitization: The new cleanup path builds a destructive shell command (rm -rf -- <path>) for sandbox execution. It derives paths from trusted agent definitions and canonical channel names, requires /sandbox/., rejects .., applies a restrictive regex, and shell-quotes paths. Because this is destructive sandbox lifecycle code and unsafe-path/future-manifest cases are not directly tested, keep as WARNING.
  • pass — 3. Authentication and Authorization: No new endpoints, auth flows, or authorization checks are introduced. The change removes stale messaging auth/session material and egress presets rather than granting access.
  • pass — 4. Dependencies and Third-Party Libraries: No new dependencies or package version changes are present in the changed files.
  • pass — 5. Error Handling and Logging: For QR-paired channels, cleanup failure now aborts before registry/policy/session mutation and rebuild, avoiding the prior false-success path. Logged errors identify channel/path context without printing secrets. Token-based channel cleanup remains best-effort, which is acceptable because token removal is the primary auth revocation for those channels.
  • pass — 6. Cryptography and Data Protection: No cryptographic algorithms or data-protection primitives are modified. The change improves data minimization by deleting durable channel session state.
  • pass — 7. Configuration and Security Headers: No HTTP headers, CORS/CSP, Dockerfile, port exposure, or container-user configuration is changed. The cleanup target is constrained to agent config/state directories under /sandbox/..
  • warning — 8. Security Testing: Regression tests cover mocked happy paths, QR cleanup failure abort, SSH fallback, OpenClaw/Hermes WhatsApp path differences, token-channel session preset stripping, and clear-before-rebuild ordering. They do not test unsafe derived paths or real sandbox/rebuild state_dirs semantics. Required E2E has not passed for this head SHA.
  • warning — 9. Holistic Security Posture: The intent improves security by removing stale QR credentials and egress policy state. Remaining operational risk is that real OpenShell sandbox exec, backup timing, and WhatsApp non-response are unproven for the current head; the sandbox lifecycle module also continues to grow.

Test / E2E status

  • Test depth: e2e_required — Runtime/sandbox/infrastructure paths need real execution coverage: docs/manage-sandboxes/messaging-channels.mdx, docs/reference/commands.mdx, src/lib/actions/sandbox/policy-channel.ts. Added unit tests are meaningful but mocked; they cannot prove real OpenShell sandbox exec, filesystem deletion, state_dirs backup exclusion, platform behavior, or WhatsApp post-removal non-response.
  • E2E Advisor: missing
  • Required E2E jobs: channels-stop-start-e2e, channels-add-remove-e2e
  • Missing for analyzed SHA: channels-stop-start-e2e, channels-add-remove-e2e

✅ What looks good

  • The PR patches code that still exists and directly overlaps the linked bug path in src/lib/actions/sandbox/policy-channel.ts; no rename drift was reported for the changed files.
  • The fix addresses both persistence layers called out by the issue: applied/session policy presets and in-sandbox durable channel state.
  • QR-paired channel cleanup now happens before registry/policy/session mutation and before rebuild; on cleanup failure it exits non-zero, which aligns with the updated docs and avoids half-removed state.
  • The new regression test covers OpenClaw and Hermes WhatsApp state path differences, SSH fallback, token-based channel session preset stripping, cleanup failure abort behavior, and cleanup-before-rebuild ordering.
  • The shell command path is constrained to /sandbox/. agent config directories and uses shellQuote before interpolation.
  • Documentation was updated to explain destructive QR-paired cleanup behavior and failure handling for unreachable sandboxes.

Review completeness

  • Review was read-only; no tests, package-manager commands, sandbox commands, PR scripts, or E2E jobs were executed by this advisor.
  • Analysis relied on the provided deterministic GitHub/CI context and the supplied diff, with targeted read-only inspection of changed files.
  • CI and E2E results were still pending for the specified head SHA 9ce04c1.
  • The E2E Advisor comment found in PR discussion auto-dispatched required jobs for d371b33; no required E2E pass evidence was available for the current head SHA.
  • A selective E2E result for prior SHA d371b33 reported channels-stop-start-e2e failure, but that result is not for the requested head SHA 9ce04c1.
  • No live WhatsApp pairing/removal/rebuild/DM evidence was available to verify the final user-visible acceptance condition.
  • Open PR overlap data indicates docs/reference/commands.mdx is concurrently touched by multiple active PRs, so final drift should be rechecked before merge.
  • Human maintainer review required: yes

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/channels-remove-full-teardown.test.ts (2)

221-261: ⚡ Quick win

Align Telegram fixture state and assert full channel unregister outcome.

This case only asserts session.policyPresets. Because preamble sessionStore.messagingChannels is still ["whatsapp"], the test can miss regressions in channel list teardown for the removed Telegram channel.

Proposed patch
 const ctx = module.exports;
+ctx.sessionStore.messagingChannels = ["telegram"];
 const registryOverride = require(${JSON.stringify(path.join(repoRoot, "dist", "lib", "state/registry.js"))});
 registryOverride.getSandbox = () => ({
   name: "test-sb",
   agent: "openclaw",
@@
     await ctx.channelModule.removeSandboxChannel("test-sb", { channel: "telegram" });
     process.stdout.write("\\n__RESULT__" + JSON.stringify({
       sessionPolicyPresets: ctx.sessionStore.policyPresets,
+      sessionMessagingChannels: ctx.sessionStore.messagingChannels,
     }) + "\\n");
@@
     assert.deepEqual(
       payload.sessionPolicyPresets,
       ["npm", "pypi", "brew"],
       "other presets must remain after removing a token-based channel",
     );
+    assert.deepEqual(
+      payload.sessionMessagingChannels,
+      [],
+      "messagingChannels should be empty after removing the only channel",
+    );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/channels-remove-full-teardown.test.ts` around lines 221 - 261, The test
only asserts session.policyPresets after calling
ctx.channelModule.removeSandboxChannel("test-sb", { channel: "telegram" }) but
does not align or assert the sandbox/session messaging channel list; ensure the
Telegram fixture state matches the preamble by setting
sessionStore.messagingChannels (or ctx.sessionStore.messagingChannels) to
include "telegram" before invoking removeSandboxChannel and add assertions that
the session's messagingChannels no longer include "telegram" (and that other
channels remain unchanged) in addition to the existing session.policyPresets
checks so the test fails on regressions that leave a removed channel registered.

27-32: ⚡ Quick win

Sanitize inherited messaging env vars before spawning the helper script.

Forwarding full process.env here can leak CI/local messaging credentials into the spawned run and make channel-removal assertions flaky. Strip messaging-related keys first, then apply extraEnv for test-specific overrides.

Proposed patch
 function runScript(scriptBody: string, extraEnv: Record<string, string> = {}): SpawnSyncReturns<string> {
   const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-3998-"));
   const scriptPath = path.join(tmpDir, "script.js");
   fs.writeFileSync(scriptPath, scriptBody);
+  const sanitizedEnv: NodeJS.ProcessEnv = { ...process.env };
+  for (const key of Object.keys(sanitizedEnv)) {
+    if (/^(DISCORD_|TELEGRAM_)/.test(key)) delete sanitizedEnv[key];
+  }
   const result = spawnSync(process.execPath, [scriptPath], {
     cwd: repoRoot,
     encoding: "utf-8",
     env: {
-      ...process.env,
+      ...sanitizedEnv,
       HOME: tmpDir,
       NEMOCLAW_NON_INTERACTIVE: "1",
       ...extraEnv,
     },
     timeout: 15000,

Based on learnings: for hermetic messaging-channel spawned tests in this repo, remove inherited unrelated messaging env vars such as DISCORD_* and TELEGRAM_*.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/channels-remove-full-teardown.test.ts` around lines 27 - 32, The test
currently spreads the entire process.env into the spawned helper (env: {
...process.env, HOME: tmpDir, NEMOCLAW_NON_INTERACTIVE: "1", ...extraEnv })
which can leak CI/local messaging credentials; instead create a sanitized env
object by cloning process.env, delete or filter out messaging-related keys (e.g.
keys matching prefixes like "DISCORD_" and "TELEGRAM_"), then use that
sanitizedEnv when composing the final env (set HOME: tmpDir and
NEMOCLAW_NON_INTERACTIVE plus ...extraEnv) so messaging vars are stripped before
the spawn.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/channels-remove-full-teardown.test.ts`:
- Around line 221-261: The test only asserts session.policyPresets after calling
ctx.channelModule.removeSandboxChannel("test-sb", { channel: "telegram" }) but
does not align or assert the sandbox/session messaging channel list; ensure the
Telegram fixture state matches the preamble by setting
sessionStore.messagingChannels (or ctx.sessionStore.messagingChannels) to
include "telegram" before invoking removeSandboxChannel and add assertions that
the session's messagingChannels no longer include "telegram" (and that other
channels remain unchanged) in addition to the existing session.policyPresets
checks so the test fails on regressions that leave a removed channel registered.
- Around line 27-32: The test currently spreads the entire process.env into the
spawned helper (env: { ...process.env, HOME: tmpDir, NEMOCLAW_NON_INTERACTIVE:
"1", ...extraEnv }) which can leak CI/local messaging credentials; instead
create a sanitized env object by cloning process.env, delete or filter out
messaging-related keys (e.g. keys matching prefixes like "DISCORD_" and
"TELEGRAM_"), then use that sanitizedEnv when composing the final env (set HOME:
tmpDir and NEMOCLAW_NON_INTERACTIVE plus ...extraEnv) so messaging vars are
stripped before the spawn.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d72ec710-b09d-4414-9b09-6fc8dcaa95af

📥 Commits

Reviewing files that changed from the base of the PR and between 322d339 and ebd67e4.

📒 Files selected for processing (2)
  • src/lib/actions/sandbox/policy-channel.ts
  • test/channels-remove-full-teardown.test.ts

…solation

Address review on #4001:

- Refuse to proceed to rebuild when the in-sandbox cleanup for a QR-paired
  channel fails. Without this guard, a failed `rm -rf` of the auth blob
  would still queue a rebuild, the state_dirs backup would re-capture the
  Baileys session, and #3998 would recur. Move the cleanup ahead of the
  registry/policy/session mutations so an early bail leaves the sandbox
  in a state where a re-run can complete the removal cleanly. Token-based
  channels keep the prior best-effort cleanup since token revocation
  already breaks the bot.
- Add a failure-path test asserting that QR-channel cleanup failure
  exits non-zero, does NOT queue a rebuild, and leaves the registry,
  policy preset, and session.policyPresets unchanged.
- Stop the test subprocess from inheriting messaging-channel env vars
  (`TELEGRAM_*`, `DISCORD_*`, `SLACK_*`, `WECHAT_*`, `WEIXIN_*`,
  `WHATSAPP_*`). A local/CI shell with one of these set would otherwise
  perturb the channel path taken inside the spawned test.
- Fix the telegram fixture: registry now reports the channel under test
  (`telegram`, not stale `whatsapp`) and the test asserts that
  `messagingChannels` is emptied via `updateSandbox` alongside the
  session.policyPresets strip.
- Document the new behaviour in `docs/reference/commands.mdx` and
  `docs/manage-sandboxes/messaging-channels.mdx` — the destructive
  in-sandbox cleanup path per agent, and the early-bail contract for
  QR-paired channels when the sandbox is unreachable.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26228799819
Target ref: ebd67e445a33617609bca1270ab977e6623fbf58
Workflow ref: main
Requested jobs: channels-add-remove-e2e,channels-stop-start-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
channels-add-remove-e2e ✅ success
channels-stop-start-e2e ⚠️ cancelled

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 26231106795
Target ref: d371b334bc655174757339a4384a7f74aeaed8dd
Workflow ref: main
Requested jobs: channels-stop-start-e2e,channels-add-remove-e2e
Summary: 1 passed, 1 failed, 0 skipped

Job Result
channels-add-remove-e2e ✅ success
channels-stop-start-e2e ❌ failure

Failed jobs: channels-stop-start-e2e. Check run artifacts for logs.

Nightly E2E (test/e2e/test-channels-stop-start.sh) regressed on the previous
commit because `executeSandboxExecCommand` can return null on a transient
openshell wrapper hiccup (banner/marker race) even when the underlying
sandbox is reachable. Treating that null as "cleanup failed" aborted
`channels remove whatsapp` on every live-sandbox run.

Wrap rm with a NEMOCLAW_CHANNEL_CLEAR_OK sentinel and check stdout for
that exact string instead of trusting the wrapper's status. Try
`openshell sandbox exec` first and fall back to `executeSandboxCommand`
(SSH) if the exec path does not surface the sentinel — mirrors the
existing pattern in process-recovery.ts:286-296 for gateway recovery.
Both paths must fail before we abort the remove.

Add a regression test covering the SSH-fallback recovery path; rename
and stiffen the abort test to require BOTH transports to fail.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
docs/manage-sandboxes/messaging-channels.mdx (1)

163-163: ⚡ Quick win

Split into separate sentences and use active voice.

This line has two style issues:

  1. One sentence per line: The semicolon connects two independent clauses that should be on separate lines to improve diff readability. As per coding guidelines, flag when multiple sentences appear on the same line.

  2. Passive voice: "stay untouched" is passive. Rewrite using active voice, for example: "the command leaves the registry, policy preset, and session.policyPresets unchanged" or "the command does not modify the registry, policy preset, or session.policyPresets".

Proposed rewrite

Split the semicolon-joined clauses and convert to active voice:

-If neither transport can reach a running sandbox for a QR-paired channel, the command exits non-zero and asks you to start the sandbox and re-run; the registry, policy preset, and `session.policyPresets` stay untouched so a re-run can complete the removal cleanly.
+If neither transport can reach a running sandbox for a QR-paired channel, the command exits non-zero and asks you to start the sandbox and re-run.
+The command does not modify the registry, policy preset, or `session.policyPresets` so a re-run can complete the removal cleanly.

As per coding guidelines: "One sentence per line in source (makes diffs readable)" and "Active voice required. Flag passive constructions."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/manage-sandboxes/messaging-channels.mdx` at line 163, Split the
semicolon-joined clause into two separate sentences (one per line) and convert
the passive phrase "stay untouched" to active voice; for example, change the
sentence to: "If neither transport can reach a running sandbox for a QR-paired
channel, the command exits non-zero and asks you to start the sandbox and
re-run. The command leaves the registry, policy preset, and
`session.policyPresets` unchanged so a re-run can complete the removal cleanly."
Update the line containing `session.policyPresets` accordingly.
test/channels-remove-full-teardown.test.ts (1)

310-327: ⚡ Quick win

Assert SSH fallback was actually attempted in the dual-failure test.

This case says “both exec and SSH cleanup fail,” but it only verifies sandboxExecCalls. A regression that skips SSH fallback would still pass.

Proposed test tightening
     process.stdout.write("\\n__RESULT__" + JSON.stringify({
       sandboxExecCalls: ctx.sandboxExecCalls,
+      sandboxSshCalls: ctx.sandboxSshCalls,
       sessionPolicyPresets: ctx.sessionStore.policyPresets,
       removedPresets: ctx.removedPresets,
       registryUpdates: ctx.registryUpdates,
       callOrder: ctx.callOrder,
       exitCode: ctx.getExitCode(),
@@
       process.stdout.write("\\n__RESULT__" + JSON.stringify({
         sandboxExecCalls: ctx.sandboxExecCalls,
+        sandboxSshCalls: ctx.sandboxSshCalls,
         sessionPolicyPresets: ctx.sessionStore.policyPresets,
         removedPresets: ctx.removedPresets,
         registryUpdates: ctx.registryUpdates,
         callOrder: ctx.callOrder,
         exitCode: ctx.getExitCode(),
@@
     const cleanupCalls = payload.sandboxExecCalls.filter((c: { command: string }) =>
       c.command.startsWith("rm -rf"),
     );
     assert.equal(cleanupCalls.length, 1, "expected the rm -rf attempt that failed");
+    assert.equal(
+      payload.sandboxSshCalls.length,
+      1,
+      "SSH fallback must also be attempted before aborting",
+    );

Also applies to: 341-366

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/channels-remove-full-teardown.test.ts` around lines 310 - 327, The test
currently only asserts sandbox exec attempts (ctx.sandboxExecCalls) for the
dual-failure "both exec and SSH cleanup fail" scenario; add an assertion that
the SSH fallback was invoked by checking the SSH-call tracking structure (e.g.,
ctx.sshExecCalls or whatever variable records SSH attempts) after the run and
include the same check in the parallel block around lines 341-366 so the test
fails if SSH fallback was skipped; locate the teardown/result printing blocks
that reference ctx.sandboxExecCalls, ctx.removedPresets, ctx.registryUpdates,
ctx.callOrder and augment them to also include/assert ctx.sshExecCalls (or the
actual SSH-tracking symbol) to verify SSH was attempted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/manage-sandboxes/messaging-channels.mdx`:
- Line 163: Split the semicolon-joined clause into two separate sentences (one
per line) and convert the passive phrase "stay untouched" to active voice; for
example, change the sentence to: "If neither transport can reach a running
sandbox for a QR-paired channel, the command exits non-zero and asks you to
start the sandbox and re-run. The command leaves the registry, policy preset,
and `session.policyPresets` unchanged so a re-run can complete the removal
cleanly." Update the line containing `session.policyPresets` accordingly.

In `@test/channels-remove-full-teardown.test.ts`:
- Around line 310-327: The test currently only asserts sandbox exec attempts
(ctx.sandboxExecCalls) for the dual-failure "both exec and SSH cleanup fail"
scenario; add an assertion that the SSH fallback was invoked by checking the
SSH-call tracking structure (e.g., ctx.sshExecCalls or whatever variable records
SSH attempts) after the run and include the same check in the parallel block
around lines 341-366 so the test fails if SSH fallback was skipped; locate the
teardown/result printing blocks that reference ctx.sandboxExecCalls,
ctx.removedPresets, ctx.registryUpdates, ctx.callOrder and augment them to also
include/assert ctx.sshExecCalls (or the actual SSH-tracking symbol) to verify
SSH was attempted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 58129e74-7856-47cd-8b70-1b5cc9f73262

📥 Commits

Reviewing files that changed from the base of the PR and between d371b33 and 9ce04c1.

📒 Files selected for processing (4)
  • docs/manage-sandboxes/messaging-channels.mdx
  • docs/reference/commands.mdx
  • src/lib/actions/sandbox/policy-channel.ts
  • test/channels-remove-full-teardown.test.ts
✅ Files skipped from review due to trivial changes (1)
  • docs/reference/commands.mdx

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26236091097
Target ref: 9ce04c1c15f4cab74464923c47396e16583f739b
Workflow ref: main
Requested jobs: channels-stop-start-e2e,channels-add-remove-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
channels-add-remove-e2e ✅ success
channels-stop-start-e2e ✅ success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Integration: WhatsApp Use this label to identify WhatsApp communication integration issues with NemoClaw.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[macOS][Agent&Skills] channels remove whatsapp does not teardown Baileys session or remove whatsapp policy preset — channel stays active after rebuild

1 participant