Skip to content

chore: upgrade agent runtime dependencies#3925

Open
ericksoa wants to merge 25 commits into
mainfrom
reopen/pr-3832-upgrade-deps
Open

chore: upgrade agent runtime dependencies#3925
ericksoa wants to merge 25 commits into
mainfrom
reopen/pr-3832-upgrade-deps

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented May 20, 2026

Summary

Verification

Links

Do not merge until release approval is explicit.

Summary by CodeRabbit

  • Chores

    • Pinned/upgraded runtime components: OpenClaw 2026.5.18, OpenShell 0.0.44, Hermes 2026.5.16, WeChat plugin 2.4.3; added runtime ENV to mark preinstalled WeChat plugin.
  • New Features

    • WeChat seeding now detects plugin presence from multiple sources and enables all discovered channels.
    • Added standalone agent JSON parser utility for consistent reply extraction.
  • Improvements

    • More robust Kimi inference splitting for combined tool calls.
    • Preserve image-managed extensions during sandbox restore; safer restore/cleanup commands.
  • Tests

    • Expanded e2e and unit tests covering parsing, plugin discovery, patching, and restore behaviors.
  • Documentation

    • Updated command examples and workflow comments to reflect new pinned versions.

Review Change Stack

@ericksoa ericksoa added Integration: OpenClaw Support for OpenClaw CI/CD Use this label to identify issues with NemoClaw CI/CD pipeline or GitHub Actions. OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents labels May 20, 2026
@ericksoa ericksoa self-assigned this May 20, 2026
@ericksoa ericksoa added dependencies Pull requests that update a dependency file E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps Integration: Hermes v0.0.48 Release target labels May 20, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 883aa45b-ad05-4812-9071-3abdb1669746

📥 Commits

Reviewing files that changed from the base of the PR and between 64efd11 and fa28360.

📒 Files selected for processing (3)
  • .agents/skills/nemoclaw-user-reference/references/commands.md
  • docs/reference/commands.mdx
  • src/lib/onboard.ts
✅ Files skipped from review due to trivial changes (2)
  • docs/reference/commands.mdx
  • .agents/skills/nemoclaw-user-reference/references/commands.md

📝 Walkthrough

Walkthrough

This PR systematically upgrades OpenClaw (2026.4.24→2026.5.18), OpenShell (0.0.39→0.0.44), Hermes (2026.4.23→2026.5.16), and WeChat plugin (2.4.2→2.4.3), while introducing enhanced WeChat metadata discovery with dynamic channel registration, refactoring Kimi inference plugin exec-call splitting with content-index awareness, improving E2E test infrastructure with a shared JSON parser utility, and strengthening sandbox state restoration with image-managed extension preservation.

Changes

Version upgrade and plugin integration enhancement

Layer / File(s) Summary
Version pins and manifest updates
Dockerfile.base, agents/hermes/Dockerfile.base, agents/hermes/manifest.yaml, agents/openclaw/manifest.yaml, nemoclaw-blueprint/blueprint.yaml, nemoclaw/package.json, nemoclaw/src/package-metadata.test.ts
OpenClaw, OpenShell, Hermes, and WeChat plugin versions are bumped across build arguments, manifests, and package.json compatibility constraints.
Onboarding and installer version defaults
src/lib/onboard.ts, src/lib/onboard/openshell-install.ts, src/lib/onboard/docker-driver-gateway-runtime-marker.test.ts, scripts/brev-launchable-ci-cpu.sh, scripts/install-openshell.sh, .github/workflows/regression-e2e.yaml
OpenShell fallback/version pin updated from 0.0.39 → 0.0.44 across onboarding logic, installer scripts, CI headers, and test fixtures.
Dockerfile build environment
Dockerfile
Patch-5 handshake comment wording generalized, pinned compiled selection runtime comment updated, and NEMOCLAW_OPENCLAW_WECHAT_PLUGIN_PREINSTALLED ENV added.
WeChat plugin discovery and dynamic registration
scripts/generate-openclaw-config.py, scripts/seed-wechat-accounts.py, test/generate-openclaw-config.test.ts, test/seed-wechat-accounts.test.ts
Adds npm/extension metadata discovery, environment preinstall signal, and dynamic channel-id enumeration; seeding now enables all discovered channel IDs and prefers npm install path when present.
Kimi inference plugin refactor
nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js, test/kimi-inference-compat-plugin.test.ts
Refactors safe exec splitting: per-block extraction, split tool-call construction with dedupe, content-index-aware event/delta rewriting, and argument encoding that matches original delta shapes.
Tool catalog patching enhancement
scripts/patch-openclaw-tool-catalog.js, test/openclaw-tool-catalog-patch.test.ts
Detect native tool-search shaped selection runtimes via pattern list and return "native-tool-search" status instead of forcing compact-catalog rewrites.
Policy endpoint extraction
src/lib/policy/index.ts, test/policies.test.ts
Tightens host: extraction regex to support YAML list-item style and to stop at comment/whitespace boundaries; adds tests for commented hosts and backtick filtering.
Shared E2E JSON parser utility
test/e2e/lib/openclaw-agent-json.py, test/openclaw-agent-json.test.ts
New helper extracts assistant text from OpenClaw JSON envelopes (top-level payloads or result.payloads), supports concatenated/streamed JSON, and reports invalid JSON errors; tests added.
Slack API proof resilience
test/e2e/lib/slack-api-proof.sh
Refactors OpenClaw root probing with createRequire/require.resolve multi-path probing, returns null when unresolved, and adds a hermetic fallback proof path.
E2E test JSON parser migration
multiple test/e2e/*.sh
Replaces inline Python JSON parsing snippets across E2E shell scripts with the shared openclaw-agent-json.py helper.
Crash recovery and sandbox operations
test/e2e/test-issue-2478-crash-loop-recovery.sh, test/e2e/test-sandbox-operations.sh
Adds DASHBOARD_PORT, improves gateway PID discovery/readiness checks (logs, /health, nemoclaw status), uses openshell sandbox list for diagnostics, preserves stderr, and requires successful SSH exit plus parsed reply.
Sandbox state restoration with extension preservation
src/lib/state/sandbox.ts, test/snapshot.test.ts
Introduce managed-extension list, builders to exclude preserved extensions from tar/cleanup, and selective cleanup command construction with validation and tests to ensure image-managed extensions are preserved.
Sandbox rebuild WeChat seeding
src/lib/actions/sandbox/rebuild.ts
After post-restore doctor step, optionally invoke seed-wechat-accounts.py inside the sandbox to re-apply WeChat seeding when present.
E2E version fixtures and assertions
many src/test and test/e2e/* files
Update mocks, assertions, and parity maps to reflect OpenClaw 2026.5.18 and OpenShell 0.0.44/0.0.45 pinned behaviors across tests.
Test suite comprehensive updates
test/generate-openclaw-config.test.ts, test/seed-wechat-accounts.test.ts, test/fetch-guard-patch-regression.test.ts, test/kimi-inference-compat-plugin.test.ts, test/openclaw-tool-catalog-patch.test.ts, test/snapshot.test.ts
Adds/updates tests for WeChat metadata handling, tool-catalog native search, Dockerfile patch regression, Kimi plugin behavior, and snapshot/restore extension preservation.
Documentation and comment updates
.agents/skills/nemoclaw-user-reference/references/commands.md, docs/reference/commands.mdx, agents/hermes/config/hermes-config.ts, test/e2e/test-full-e2e.sh
Update example outputs and inline comments to reflect new versions and behavior notes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#3808: Updates scripts/patch-openclaw-tool-catalog.js with native-tool-search detection and status propagation related to this PR's patcher changes.
  • NVIDIA/NemoClaw#3926: Modifies WeChat seeding logic in scripts/seed-wechat-accounts.py, closely related to the channel discovery and seeding changes here.
  • NVIDIA/NemoClaw#3839: Also modifies WeChat seeding flow and re-run semantics after config mutations, aligning with sandbox rebuild seeding changes.

Suggested reviewers

  • cv
  • jyaunches

Poem

🐰 Hopping through pins from old to new,
OpenClaw and Hermes get a brightened hue.
WeChat finds channels by reading the ground,
Kimi splits commands without a rebound,
E2E parser hums — tests sing through and through.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: upgrading agent runtime dependencies (OpenClaw, OpenShell, Hermes, and WeChat plugin versions). It is concise, specific, and accurately reflects the primary objective of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch reopen/pr-3832-upgrade-deps

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

E2E Advisor Recommendation

Required E2E: test-e2e-sandbox, test-e2e-gateway-isolation, cloud-e2e, onboard-inference-smoke-e2e, openshell-version-pin-e2e, openshell-gateway-upgrade-e2e, launchable-smoke-e2e, sandbox-operations-e2e, rebuild-openclaw-e2e, upgrade-stale-sandbox-e2e, hermes-e2e, rebuild-hermes-e2e, kimi-inference-compat-e2e, messaging-compatible-endpoint-e2e, bedrock-runtime-compatible-anthropic-e2e, openclaw-inference-switch-e2e, network-policy-e2e, credential-sanitization-e2e, gateway-health-honest-e2e, gateway-drift-preflight-e2e
Optional E2E: brave-search-e2e, openclaw-slack-pairing-e2e, channels-add-remove-e2e, channels-stop-start-e2e, model-router-provider-routed-inference-e2e, openclaw-plugin-runtime-exdev-e2e, rebuild-hermes-stale-base-e2e, issue-2478-crash-loop-recovery-e2e

Dispatch hint: cloud-e2e,openshell-gateway-upgrade-e2e,launchable-smoke-e2e,sandbox-operations-e2e,rebuild-openclaw-e2e,upgrade-stale-sandbox-e2e,hermes-e2e,rebuild-hermes-e2e,kimi-inference-compat-e2e,messaging-compatible-endpoint-e2e,bedrock-runtime-compatible-anthropic-e2e,openclaw-inference-switch-e2e,network-policy-e2e,credential-sanitization-e2e

Auto-dispatched E2E: cloud-e2e, openshell-gateway-upgrade-e2e, launchable-smoke-e2e, sandbox-operations-e2e, rebuild-openclaw-e2e, upgrade-stale-sandbox-e2e, hermes-e2e, rebuild-hermes-e2e, kimi-inference-compat-e2e, messaging-compatible-endpoint-e2e, bedrock-runtime-compatible-anthropic-e2e, openclaw-inference-switch-e2e, network-policy-e2e, credential-sanitization-e2e via nightly-e2e.yaml at fa28360ebd0df3542f51a62c99aa6e131e1e3e35nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • test-e2e-sandbox (medium): Builds the production sandbox image and runs the in-container sandbox E2E smoke; required for Dockerfile/OpenClaw version and plugin-install patch changes.
  • test-e2e-gateway-isolation (medium): Validates gateway isolation/security behavior against the production image after Dockerfile, gateway auth/token, and OpenClaw runtime patch changes.
  • cloud-e2e (high): Full install → onboard → live inference → CLI flow is required because onboarding, deployment verification, blueprint pins, OpenClaw config generation, and runtime images changed.
  • onboard-inference-smoke-e2e (low): Onboard must only succeed after a real route works; required for onboard.ts, OpenShell install, OpenClaw config generation, and provider version changes.
  • openshell-version-pin-e2e (low): OpenShell min/max pin changed to 0.0.44 and install-openshell/version-check logic changed; this validates replacement of unsupported sticky OpenShell versions.
  • openshell-gateway-upgrade-e2e (high): OpenShell release pin and install/upgrade path changed; validates upgrading an existing install while preserving sandbox agent state.
  • launchable-smoke-e2e (medium): brev-launchable-ci-cpu.sh changes the default OpenShell version and preinstall path; validates the community/launchable install flow end-to-end.
  • sandbox-operations-e2e (high): Sandbox state and rebuild/lifecycle source changed; validates list/status/connect/logs/destroy/recovery and multi-sandbox metadata behavior.
  • rebuild-openclaw-e2e (high): OpenClaw version bump and rebuild implementation changes need a live rebuild proof that workspace/state survive and the running agent upgrades.
  • upgrade-stale-sandbox-e2e (medium): Version/stale-sandbox behavior is in scope through OpenClaw/OpenShell pin changes and sandbox version tests; validates upgrade handling for older sandboxes.
  • hermes-e2e (high): Hermes base image, manifest expected version, and config changed; validate Hermes onboarding, health, and basic assistant flow.
  • rebuild-hermes-e2e (high): Hermes version bump and manifest changes require state-preserving rebuild validation for the Hermes agent path.
  • kimi-inference-compat-e2e (medium): The Kimi inference compatibility plugin changed materially around safe exec splitting, streaming deltas, and dedupe; this exact real sandbox trajectory must run.
  • messaging-compatible-endpoint-e2e (medium): Config generation, messaging channel setup, and compatible endpoint routing are changed; validates Telegram plus inference.local/OpenAI-compatible path through the sandbox.
  • bedrock-runtime-compatible-anthropic-e2e (high): Provider compatibility and hidden credential routing can be affected by config generation and agent/runtime version bumps; validates both OpenClaw and Hermes compatible Bedrock Runtime paths.
  • openclaw-inference-switch-e2e (medium): Inference route/config code and OpenClaw runtime version changed; validates switching a running OpenClaw sandbox and serving live requests afterward.
  • network-policy-e2e (medium): Policy source and Dockerfile fetch/proxy security patches changed; validates enforced network policy and sandbox egress boundaries.
  • credential-sanitization-e2e (medium): WeChat seeding, credential placeholders, config generation, and policy/security boundaries changed; validate secrets are not leaked into snapshots or sandbox-readable state.
  • gateway-health-honest-e2e (low): OpenShell install/gateway runtime marker and deployment verification changes can affect gateway health classification; this guards false healthy reports.
  • gateway-drift-preflight-e2e (low): Sandbox state and OpenShell gateway version changes can alter fail-closed behavior for stale gateway images/protobuf drift.

Optional E2E

  • brave-search-e2e (medium): Useful confidence for web-search/tool credential routing after OpenClaw version and config generation changes, but no Brave-specific source changed.
  • openclaw-slack-pairing-e2e (medium): Slack proof helper changed and OpenClaw runtime changed; optional targeted coverage for Slack pairing and credential rewrite if Slack behavior is suspected.
  • channels-add-remove-e2e (medium): Adjacent messaging lifecycle confidence after generate-openclaw-config and channel plugin changes; not strictly required unless channel add/remove code changed.
  • channels-stop-start-e2e (medium): Adjacent channel lifecycle coverage for OpenClaw plugin/runtime version changes, especially messaging bridge stability.
  • model-router-provider-routed-inference-e2e (medium): Optional routed-provider confidence for broad inference/runtime version changes; router-specific files were not the main diff.
  • openclaw-plugin-runtime-exdev-e2e (low): Optional plugin runtime confidence after OpenClaw version bump and plugin install/catalog patch changes.
  • rebuild-hermes-stale-base-e2e (high): Optional extra guard for Hermes base-image refresh semantics after Hermes base/version bump; run if CI capacity allows.
  • issue-2478-crash-loop-recovery-e2e (medium): The test script changed and OpenClaw startup/runtime patches changed; useful soak coverage for gateway recovery but less directly tied than lifecycle/rebuild tests.

New E2E recommendations

  • WeChat onboarding and token seeding (high): This PR changes WeChat QR handling, preinstalled openclaw-weixin behavior, seed-wechat-accounts.py, and config generation, but there is no visible existing WeChat E2E equivalent to the Telegram/Slack/Discord flows.
    • Suggested test: Add a hermetic WeChat/iLink onboarding E2E that uses a fake iLink endpoint or fixture QR flow, verifies seeded account metadata, no secret leakage, bridge startup, rebuild preservation, and policy/proxy routing.
  • OpenClaw upstream patch drift on version bumps (medium): Dockerfile sed/grep patches target compiled OpenClaw internals; unit tests cover regexes, but a live E2E specifically asserting patched fetch-guard, tool catalog, and handshake behavior against the newly pinned OpenClaw version would reduce release-bump risk.
    • Suggested test: Add an OpenClaw-version-bump patch-integrity E2E that builds the production image, asserts all NemoClaw OpenClaw patches are present, boots a sandbox, and exercises a proxy media fetch plus tool-catalog prompt path.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: cloud-e2e,openshell-gateway-upgrade-e2e,launchable-smoke-e2e,sandbox-operations-e2e,rebuild-openclaw-e2e,upgrade-stale-sandbox-e2e,hermes-e2e,rebuild-hermes-e2e,kimi-inference-compat-e2e,messaging-compatible-endpoint-e2e,bedrock-runtime-compatible-anthropic-e2e,openclaw-inference-switch-e2e,network-policy-e2e,credential-sanitization-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Review Advisor

Recommendation: blocked
Confidence: high
Analyzed HEAD: fa28360ebd0df3542f51a62c99aa6e131e1e3e35
Findings: 5 blocker(s), 11 warning(s), 1 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: The supplied diff was truncated at 160000 characters; not every changed line in all 51 files was independently reviewed in full.; No live command execution, package-manager install, or test execution was performed; results are based on trusted supplied GitHub/diff context and read-only review.; Dependency CVE/license status for newly pinned upstream versions was not available in the supplied context.; E2E Advisor comments were available, but supplied E2E result comments do not demonstrate all required jobs passing for exact head SHA fa28360.; No linked issues were supplied, so linked-issue acceptance could not be mapped beyond PR-body, E2E Advisor, and review-comment clauses.; CodeRabbit and GitHub Advanced Security review comments were present and unresolved; final maintainer disposition was not available.; Open PR overlap and drift evidence were reviewed from supplied metadata, not by checking out and comparing every overlapping branch.

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: fa28360ebd0df3542f51a62c99aa6e131e1e3e35
Recommendation: blocked
Confidence: high

Blocked: current-head CI is pending, mergeability is BLOCKED, 4 review threads remain unresolved, and required sandbox/runtime/security E2E has not passed for exact head SHA fa28360.

Gate status

  • CI: pending — GraphQL statusCheckRollup for fa28360 shows 14 pending/in-progress/queued contexts, including cli-parity, preview, E2E recommendation, wsl-e2e, macos-e2e, PR review advisor, CodeQL javascript-typescript, CodeQL python, unit-vitest-linux, checks, ShellCheck SARIF, build-sandbox-images, build-sandbox-images-arm64, and CodeRabbit.
  • Mergeability: fail — GitHub reports mergeStateStatus=BLOCKED, reviewDecision=REVIEW_REQUIRED, mergeable_state=blocked.
  • Review threads: fail — 4 unresolved review thread(s): CodeRabbit on Kimi argument shape line 194, CodeRabbit on safe-exec dedupe line 208, GitHub Advanced Security/CodeQL on line 303, and CodeRabbit on rebuild stdout guard line 834.
  • Risky code tested: warning — Risky areas detected: credentials/inference/network, installer/bootstrap shell, onboarding/host glue, sandbox/policy/SSRF, workflow/enforcement. Test files changed and some stale E2E passed, but exact-head E2E pass evidence is missing for fa28360.

🔴 Blockers

  • Current-head CI is still pending: The requested head SHA has multiple in-progress, queued, or pending checks. For runtime, sandbox, installer, workflow, and security-sensitive dependency upgrades, stale success from earlier commits is not sufficient evidence.
    • Recommendation: Wait for all required and relevant checks to complete successfully for fa28360, including unit tests, CodeQL, ShellCheck, image builds, platform E2E, CodeRabbit, and advisor jobs.
    • Evidence: GraphQL statusCheckRollup lists pending/in-progress/queued contexts including cli-parity, preview, E2E recommendation, wsl-e2e, macos-e2e, PR review advisor, CodeQL javascript-typescript, CodeQL python, unit-vitest-linux, checks, ShellCheck SARIF, build-sandbox-images, build-sandbox-images-arm64, and CodeRabbit.
  • Mergeability and review gates are blocked: The PR is not currently mergeable under repository gates and unresolved review threads require maintainer disposition before the branch can be considered release-ready.
    • Recommendation: Resolve or explicitly disposition all unresolved review threads, satisfy required review, and re-check mergeability after CI completes.
    • Evidence: GitHub reports mergeStateStatus=BLOCKED and reviewDecision=REVIEW_REQUIRED; deterministic context reports 4 unresolved review thread(s).
  • Required E2E evidence is missing for exact head SHA: The E2E Advisor required broad sandbox/runtime/security coverage, but supplied E2E result comments do not show all required jobs passing for fa28360. The latest required run evidence targets 64efd11 and still had openclaw-inference-switch-e2e failing.
    • Recommendation: Require all E2E Advisor required jobs to pass for the exact current head SHA before treating the PR as merge-ready.
    • Evidence: E2E Advisor required openshell-version-pin-e2e, openshell-gateway-upgrade-e2e, cloud-e2e, openclaw-plugin-runtime-exdev-e2e, kimi-inference-compat-e2e, messaging-compatible-endpoint-e2e, bedrock-runtime-compatible-anthropic-e2e, openclaw-inference-switch-e2e, sandbox-operations-e2e, rebuild-openclaw-e2e, network-policy-e2e, credential-sanitization-e2e, hermes-e2e, and launchable-smoke-e2e. Run 26194670119 targeted 64efd11, not fa28360, and failed openclaw-inference-switch-e2e.
  • Existing monolith grew by 78 lines (src/lib/state/sandbox.ts:1): The PR grows an already-large sandbox state module from 1574 to 1652 lines, violating the repository monolith-growth guard for current monoliths.
    • Recommendation: Extract the added behavior into focused helper modules or otherwise offset the growth before merge, with direct tests for the extracted logic.
    • Evidence: monolithDeltas reports src/lib/state/sandbox.ts baseLines=1574, headLines=1652, delta=78, severity=blocker.
  • Existing monolith grew by 26 lines (src/lib/actions/sandbox/rebuild.ts:1): The PR grows an already-large sandbox rebuild module from 835 to 861 lines, violating the repository monolith-growth guard for current monoliths.
    • Recommendation: Extract the new rebuild/WeChat seeding behavior into a focused helper and keep rebuild.ts from growing, with unit coverage for success and skip paths.
    • Evidence: monolithDeltas reports src/lib/actions/sandbox/rebuild.ts baseLines=835, headLines=861, delta=26, severity=blocker.

🟡 Warnings

  • Split tool calls do not preserve original arguments schema (nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js:194): The split tool-call builder creates split tool calls with arguments as an object literal. If the original OpenClaw tool call used JSON-string arguments, message.content, partial.content, and event.toolCall can drift between string and object schemas mid-stream.
    • Recommendation: Preserve the original argument representation when cloning split tool calls, for example by using encodeToolCallArgumentsLike(toolCall.arguments, command) for all split tool-call construction sites, and add regression coverage for string-argument tool calls.
    • Evidence: Unresolved CodeRabbit thread at line 194 states buildSplitToolCalls returns arguments: { command } while encodeToolCallArgumentsLike() is only used for event.delta.
  • Safe exec dedupe may drop legitimate later tool calls (nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js:208): dedupeSafeExecToolCalls uses one global seenSafeExecCommands set across the whole content array. That can remove later hostname/date/uptime tool calls from different original tool invocations, losing legitimate tool requests rather than only removing duplicates from one split.
    • Recommendation: Scope dedupe to each original combined exec call, or only dedupe the replacement list produced for one split. Add regression tests covering two independent safe exec calls in the same message.
    • Evidence: Unresolved CodeRabbit thread at line 208 says the helper remembers hostname/date/uptime globally and can drop subsequent blocks from different tool calls.
  • CodeQL reports a useless assignment in stream rewrite path (nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js:303): GitHub Advanced Security flagged a useless assignment to local variable changed. This is not itself a confirmed vulnerability, but it indicates incomplete cleanup and leaves a required security review thread unresolved.
    • Recommendation: Remove or restructure the unused assignment and ensure CodeQL completes successfully on the current head.
    • Evidence: Unresolved GitHub Advanced Security review thread: 'CodeQL / Useless assignment to local variable. The value assigned to changed here is unused.'
  • Successful WeChat seed path can throw when stdout is undefined (src/lib/actions/sandbox/rebuild.ts:834): The rebuild success path calls seedWechatResult.stdout.includes(...). If executeSandboxCommand returns status 0 with stdout undefined, the rebuild path can throw despite the seed command succeeding.
    • Recommendation: Coerce stdout to a string before includes, e.g. const seedStdout = seedWechatResult.stdout ?? '', and cover the status-0/no-stdout case.
    • Evidence: Unresolved CodeRabbit thread at src/lib/actions/sandbox/rebuild.ts lines 827-834 reports the missing stdout guard.
  • Compiled OpenClaw SSRF/proxy patch remains high-risk (Dockerfile:130): The Dockerfile rewrites compiled OpenClaw fetch-guard behavior and neutralizes assertExplicitProxyAllowed inside OPENSHELL_SANDBOX. This intentionally changes SSRF/proxy enforcement boundaries in the sandbox runtime.
    • Recommendation: Have a maintainer explicitly review the OpenClaw 2026.5.18 fetch-guard and gateway code shape, confirm the OPENSHELL_SANDBOX-gated bypass remains scoped to the intended OpenShell trust boundary, and require network-policy/SSRF negative E2E for the current head.
    • Evidence: Dockerfile Patch 1 rewrites withStrictGuardedFetchMode exports and Patch 2 injects an OPENSHELL_SANDBOX-gated return into assertExplicitProxyAllowed.
  • Version pins changed while pinned sandbox image digest remains unchanged (nemoclaw-blueprint/blueprint.yaml:5): The blueprint now requires OpenShell 0.0.44 and OpenClaw 2026.5.18, but both the top-level digest and components.sandbox.image digest remain sha256:b3d832b596ab6b7184a9dcb4ae93337ca32851a4f93b00765cc12de26baa3a9a. If consumed before release tooling rewrites the digest, metadata can claim newer runtime compatibility while pointing at an older image.
    • Recommendation: Confirm release tooling rewrites both digest fields before publication, or update the digest in this PR if the branch is directly consumable. Add or confirm a test that the pinned digest image contains the declared runtime versions.
    • Evidence: blueprint.yaml changes min_openshell_version/max_openshell_version to 0.0.44 and min_openclaw_version to 2026.5.18 while digest fields remain unchanged.
  • Workflow trusted-code boundary requires explicit review (.github/workflows/regression-e2e.yaml:98): The PR changes a GitHub Actions workflow used for regression E2E. Trusted context flags workflow changes involving secrets/GitHub tokens and runtime dependency installs. Even small workflow changes are high-risk because E2E workflows may run repository scripts around inherited secrets or privileged credentials.
    • Recommendation: Verify the workflow still uses trusted checked-out code appropriately, avoids running PR-controlled scripts with privileged secrets except on trusted branches, keeps minimal permissions, and pins actions/dependencies where feasible.
    • Evidence: workflowSignals report workflow files changed, secrets or GitHub tokens appear in workflow diff, and workflow installs runtime dependencies. The changed workflow uses secrets: inherit and repository shell scripts.
  • Large reintroduction overlaps active work and high-churn files: This PR reintroduces reverted dependency/runtime upgrade work after Revert "chore: upgrade agent runtime dependencies (#3832)" #3924 and touches OpenShell/OpenClaw/Hermes, Docker, policy, docs, and E2E files with heavy recent history and active overlapping PRs. This raises drift risk and can supersede or contradict active fixes.
  • E2E assertions changed alongside runtime behavior (test/e2e/docs/parity-map.yaml:1): The PR modifies many E2E scripts and generated parity inventory while also changing the runtime under test. This can accidentally weaken or reclassify coverage at the same time behavior changes.
    • Recommendation: Review the semantic diff of E2E assertions, not just whether tests pass. Confirm negative paths remain covered for sandbox restore, OpenClaw agent JSON parsing, Slack proof, OpenShell version pinning, gateway upgrade, sandbox operations, and messaging-compatible endpoint routing.
    • Evidence: Changed files include test/e2e-test.sh, parity-inventory.generated.json, parity-map.yaml, openclaw-agent-json.py, slack-api-proof.sh, and many test/e2e/test-*.sh scripts.
  • Dependency and artifact review is incomplete for new runtime pins: OpenClaw, OpenShell, Hermes, WeChat plugin, uv, and workflow/runtime dependencies are upgraded. Several artifacts are version-pinned or checksum-pinned, but supplied context does not include CVE/license/advisory review for the new versions.
    • Recommendation: Complete dependency security and license review for the newly pinned versions and confirm installer/download checksum and registry trust assumptions remain valid.
    • Evidence: Diff updates OpenClaw 2026.5.18, OpenShell 0.0.44, Hermes v2026.5.16, @tencent-weixin/openclaw-weixin 2.4.3, and UV_VERSION 0.11.8; no OSV/Snyk/GitHub Advisory evidence was supplied.
  • WeChat/OpenClaw preinstall and account seeding lacks dedicated E2E: This PR changes WeChat QR/account state, seed-wechat-accounts.py, and preinstalled openclaw-weixin plugin behavior. Existing messaging E2Es focus mainly on Telegram/Discord/Slack and do not prove WeChat onboarding or seeded account activation.
    • Recommendation: Add or run a hermetic WeChat/OpenClaw plugin preinstall and account seeding E2E using fake iLink/WeChat endpoints, verifying seeded files, channel enablement, credential isolation, gateway startup, and rebuild persistence.
    • Evidence: E2E Advisor explicitly recommends a new 'WeChat/OpenClaw plugin preinstall and account seeding' test with suggested job wechat-onboard-seeding-e2e.

🔵 Suggestions

  • Release notes should call out operator rebuild guidance (docs/reference/commands.mdx:306): The command reference updates the displayed OpenClaw version, and the PR changes OpenShell, OpenClaw, Hermes, and WeChat plugin pins. Existing sandbox operators will need clear rebuild and compatibility guidance for the release track.
    • Recommendation: Ensure release notes or upgrade docs call out the OpenShell 0.0.44 pin, OpenClaw 2026.5.18 rebuild requirement, Hermes v2026.5.16, WeChat plugin 2.4.3 compatibility, and any required sandbox image digest update.
    • Evidence: docs/reference/commands.mdx and .agents/skills/.../commands.md update the displayed OpenClaw version to v2026.5.18; blueprint and installer pins also changed.

Acceptance coverage

  • unknown — No linked issues were detected for PR chore: upgrade agent runtime dependencies #3925.: github.linkedIssues is an empty array, so there are no linked issue acceptance clauses to map literally.
  • partial — Reopens the dependency/runtime upgrade work from chore: upgrade agent runtime dependencies #3832 after it was accidentally merged before release readiness.: The commit list and diff reintroduce a broad runtime dependency upgrade across OpenClaw, OpenShell, Hermes, Dockerfiles, workflow, and tests. Recent history shows chore: upgrade agent runtime dependencies #3832 and revert Revert "chore: upgrade agent runtime dependencies (#3832)" #3924. Current-head CI, E2E, and review gates are not complete.
  • partialchore: upgrade agent runtime dependencies #3832 was reverted by Revert "chore: upgrade agent runtime dependencies (#3832)" #3924 to restore main; this PR reintroduces the same reviewed branch content from head cd41b2b.: Recent history includes e2c88b3 Revert "chore: upgrade agent runtime dependencies (chore: upgrade agent runtime dependencies #3832)" (Revert "chore: upgrade agent runtime dependencies (#3832)" #3924), and the branch contains the runtime upgrade work. Exact equivalence to cd41b2b was not independently proven because current head fa28360 includes later merge/fix commits.
  • partial — Keeps the OpenClaw/OpenShell/Hermes upgrade remediation intact for the v0.0.48 release track.: Diff updates OpenClaw to 2026.5.18, OpenShell min/max/pin to 0.0.44, Hermes to v2026.5.16, and PR labels include v0.0.48. However, merge gates and exact-head E2E are not complete, and blueprint image digest consistency needs confirmation.
  • partial — Synthetic merge against current main succeeded locally.: GitHub reports mergeable=true, but mergeStateStatus=BLOCKED. The local synthetic merge claim is PR-body evidence only and was not independently verified by a trusted current-head job in the supplied context.
  • unknown — git diff --check on the synthetic merge tree passed.: No trusted current-head diff-check job result was supplied. The statement appears only in the PR body.
  • partial — Previous full nightly on the original chore: upgrade agent runtime dependencies #3832 head passed before the accidental merge/revert cycle: https://github.com/NVIDIA/NemoClaw/actions/runs/26144040521: Prior nightly success is stale relative to current head fa28360. Current-head CI is pending and supplied E2E comments do not show all required E2E passing for fa28360.
  • unknown — Do not merge until release approval is explicit.: No explicit release approval evidence was present in the supplied context. GitHub currently reports mergeStateStatus=BLOCKED and reviewDecision=REVIEW_REQUIRED.
  • missing — Preserve the original arguments shape when cloning split tool calls.: Unresolved CodeRabbit thread at nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js:194 states buildSplitToolCalls still uses arguments: { command } and can change schema mid-stream.
  • missing — Don't dedupe safe exec calls across the whole message.: Unresolved CodeRabbit thread at nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js:208 states dedupeSafeExecToolCalls uses a global seenSafeExecCommands set and can drop legitimate later tool calls.
  • missing — CodeQL / Useless assignment to local variable: Unresolved GitHub Advanced Security thread at nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js:303 reports an unused assignment to changed.
  • missing — Guard stdout before calling .includes() in the success path.: Unresolved CodeRabbit thread at src/lib/actions/sandbox/rebuild.ts:834 reports that seedWechatResult.stdout can be undefined while status is 0, causing a runtime throw.
  • missing — Required E2E: openshell-version-pin-e2e: E2E Advisor required this job, but no supplied result shows it passed for exact head SHA fa28360.
  • missing — Required E2E: openshell-gateway-upgrade-e2e: A prior run for 64efd11 passed openshell-gateway-upgrade-e2e, but no supplied result shows it passed for exact head SHA fa28360.
  • missing — Required E2E: cloud-e2e: A prior run for 64efd11 passed cloud-e2e, but no supplied result shows it passed for exact head SHA fa28360.
  • missing — Required E2E: openclaw-plugin-runtime-exdev-e2e: E2E Advisor required this job, but no supplied result shows it passed for exact head SHA fa28360.
  • missing — Required E2E: kimi-inference-compat-e2e: A prior run for 64efd11 passed kimi-inference-compat-e2e, but no supplied result shows it passed for exact head SHA fa28360.
  • missing — Required E2E: messaging-compatible-endpoint-e2e: A prior run for 64efd11 passed messaging-compatible-endpoint-e2e, but no supplied result shows it passed for exact head SHA fa28360.
  • missing — Required E2E: bedrock-runtime-compatible-anthropic-e2e: A prior run for 64efd11 passed bedrock-runtime-compatible-anthropic-e2e, but no supplied result shows it passed for exact head SHA fa28360.
  • missing — Required E2E: openclaw-inference-switch-e2e: Run 26194670119 for prior SHA 64efd11 failed openclaw-inference-switch-e2e; no supplied result shows it passed for exact head SHA fa28360.

Security review

  • pass — 1. Secrets and Credentials: No hardcoded secret values were evident in the supplied diff. WeChat, Slack, Telegram, Discord, Brave, gateway, and provider credentials are represented by environment/provider placeholders; Dockerfile clears generated gateway auth from openclaw.json. Workflow uses GitHub token/secrets contexts, which require trusted-boundary review but are not literal committed secrets.
  • warning — 2. Input Validation and Data Sanitization: The PR changes URL/config parsing, OpenClaw stream/tool-call rewriting, WeChat account seeding, plugin metadata discovery, and compiled fetch-guard behavior. There are allowlist-style checks for safe exec splitting and version/env parsing, but unresolved review threads identify correctness issues in tool-call argument encoding and dedupe scope. These paths process untrusted model/tool output, URLs, metadata JSON, and base64 JSON.
  • warning — 3. Authentication and Authorization: The change affects gateway auth/device-auth defaults, WeChat session seeding, OpenClaw/Hermes gateway manifests, messaging channel state, and inherited E2E workflow secrets. No confirmed auth bypass is shown, but the runtime upgrade and pending current-head E2E mean bearer/device-pairing, channel enablement, and credential rewrite behavior need live validation.
  • warning — 4. Dependencies and Third-Party Libraries: OpenClaw, OpenShell, Hermes, @tencent-weixin/openclaw-weixin, uv, and workflow/runtime dependencies are upgraded. Several artifacts are pinned by version/digest/checksum, but current evidence does not include CVE/license review for the new versions, and the blueprint sandbox image digest did not change with the version pins.
  • warning — 5. Error Handling and Logging: Most reviewed logging avoids secrets, and seed-wechat-accounts logs account IDs and file paths rather than tokens. However, an unresolved rebuild success-path bug can throw when stdout is undefined, turning a successful seed command into a runtime failure; this is error-handling correctness in a sandbox rebuild path.
  • pass — 6. Cryptography and Data Protection: Not applicable — no new cryptographic primitives or custom crypto were introduced in this change. Existing checksum/digest pinning patterns remain present for Docker image/base artifacts, gosu, NodeSource, and Hermes tarball paths.
  • warning — 7. Configuration and Security Headers: High-risk configuration changed: OpenShell/OpenClaw version pins, Dockerfiles, workflow E2E, sandbox image digest metadata, plugin enables, proxy/SSRF patching, and device-auth behavior. The unchanged blueprint image digest while version pins changed is a configuration integrity concern until release tooling or tests prove consistency.
  • warning — 8. Security Testing: The PR adds/updates many targeted tests, including fetch guard, OpenClaw config generation, tool catalog patching, Kimi compat, WeChat seeding, policies, installer version checks, and E2E scripts. However, security-critical runtime/network/credential behavior requires current-head E2E, CodeQL, ShellCheck, and image-build results, which are still pending or not evidenced for fa28360.
  • warning — 9. Holistic Security Posture: The PR is broad and touches sandbox escape/SSRF-sensitive patches, installer trust, workflow execution, credential injection, messaging state, runtime dependency supply chain, and sandbox lifecycle. No confirmed exploitable vulnerability was identified from the supplied diff excerpt, but merge should wait for gates and explicit human review of the trust boundaries.

Test / E2E status

  • Test depth: e2e_required — Runtime/sandbox/infrastructure paths need real execution coverage: .agents/skills/nemoclaw-user-reference/references/commands.md, .github/workflows/regression-e2e.yaml, Dockerfile, Dockerfile.base, agents/hermes/Dockerfile.base, agents/hermes/config/hermes-config.ts, agents/hermes/manifest.yaml, agents/openclaw/manifest.yaml. Unit tests are useful but cannot prove host/VM/sandbox network, credential, image, gateway, or workflow behavior.
  • E2E Advisor: missing
  • Required E2E jobs: openshell-version-pin-e2e, openshell-gateway-upgrade-e2e, cloud-e2e, openclaw-plugin-runtime-exdev-e2e, kimi-inference-compat-e2e, messaging-compatible-endpoint-e2e, bedrock-runtime-compatible-anthropic-e2e, openclaw-inference-switch-e2e, sandbox-operations-e2e, rebuild-openclaw-e2e, network-policy-e2e, credential-sanitization-e2e, hermes-e2e, launchable-smoke-e2e
  • Missing for analyzed SHA: openshell-version-pin-e2e, openshell-gateway-upgrade-e2e, cloud-e2e, openclaw-plugin-runtime-exdev-e2e, kimi-inference-compat-e2e, messaging-compatible-endpoint-e2e, bedrock-runtime-compatible-anthropic-e2e, openclaw-inference-switch-e2e, sandbox-operations-e2e, rebuild-openclaw-e2e, network-policy-e2e, credential-sanitization-e2e, hermes-e2e, launchable-smoke-e2e

✅ What looks good

  • Version pins are updated consistently across many source-level references: OpenClaw 2026.5.18, OpenShell 0.0.44, Hermes v2026.5.16, and WeChat plugin 2.4.3.
  • Dockerfile.base pins the node base image by digest and Hermes tarball by SHA-256, preserving important supply-chain controls.
  • OpenClaw patch scripts fail closed for missing expected compiled symbols/patterns instead of silently skipping patch drift.
  • Tests were added or updated for config generation, installer version checks, OpenClaw tool catalog patching, Kimi inference compatibility, OpenClaw agent JSON parsing, WeChat account seeding, policies, snapshot behavior, sandbox provisioning, and fetch-guard patch regression.
  • Docs and user-reference examples were updated to match the new OpenClaw pin.
  • Monolith line counts for src/lib/onboard.ts and src/lib/policy/index.ts are net-zero, so this PR does not grow those existing monoliths.

Review completeness

  • The supplied diff was truncated at 160000 characters; not every changed line in all 51 files was independently reviewed in full.
  • No live command execution, package-manager install, or test execution was performed; results are based on trusted supplied GitHub/diff context and read-only review.
  • Dependency CVE/license status for newly pinned upstream versions was not available in the supplied context.
  • E2E Advisor comments were available, but supplied E2E result comments do not demonstrate all required jobs passing for exact head SHA fa28360.
  • No linked issues were supplied, so linked-issue acceptance could not be mapped beyond PR-body, E2E Advisor, and review-comment clauses.
  • CodeRabbit and GitHub Advanced Security review comments were present and unresolved; final maintainer disposition was not available.
  • Open PR overlap and drift evidence were reviewed from supplied metadata, not by checking out and comparing every overlapping branch.
  • Human maintainer review required: yes

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/test-issue-2478-crash-loop-recovery.sh (1)

278-301: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

wait_for_gateway_up timeout is not wall-clock bounded.

Line 278 can block up to 20s per poll, but Line 301 only increments elapsed by 2s. So wait_for_gateway_up 45 can run for several minutes, making E2E timing behavior misleading and increasing hang risk.

Suggested fix
 gateway_runtime_ready() {
+  local status_timeout="${1:-5}"
   if sandbox_exec sh -c "grep -Fq '[gateway] ready' /tmp/gateway.log 2>/dev/null"; then
     return 0
   fi
@@
   local status_output
-  status_output="$(timeout 20 nemoclaw "$SANDBOX_NAME" status 2>&1)" || true
+  status_output="$(timeout "$status_timeout" nemoclaw "$SANDBOX_NAME" status 2>&1)" || true
@@
 wait_for_gateway_up() {
   local timeout="${1:-30}"
-  local elapsed=0 pid=""
-  while [ "$elapsed" -lt "$timeout" ]; do
+  local pid=""
+  local deadline=$((SECONDS + timeout))
+  while [ "$SECONDS" -lt "$deadline" ]; do
     pid="$(gateway_pid)"
-    if [ -n "$pid" ] && gateway_runtime_ready; then
+    if [ -n "$pid" ] && gateway_runtime_ready 5; then
       echo "$pid"
       return 0
     fi
     sleep 2
-    elapsed=$((elapsed + 2))
   done
   echo ""
   return 1
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-issue-2478-crash-loop-recovery.sh` around lines 278 - 301, The
wait_for_gateway_up function can exceed the requested wall-clock timeout because
gateway_runtime_ready (which invokes timeout 20) may block per iteration; fix by
bounding each loop iteration to the remaining time and accounting for real
elapsed time: compute remaining=$((timeout - elapsed)) at loop start, if
remaining<=0 return 1, then call gateway_runtime_ready via the timeout wrapper
(e.g. timeout "$remaining" gateway_runtime_ready) or modify
gateway_runtime_ready to accept a per-call timeout parameter and pass
min(remaining,20); measure time before/after the call and increment elapsed by
the actual time spent (including sleep) so the while loop truly respects the
wall-clock timeout while still calling gateway_pid and gateway_runtime_ready.
🧹 Nitpick comments (5)
src/lib/onboard.ts (1)

2924-2925: ⚡ Quick win

Avoid duplicating the OpenShell fallback version string in multiple places.

"0.0.44" is now hardcoded in both fallback selection and install guidance. Please derive this from one shared value (e.g., blueprint max with a single constant fallback) so future bumps can’t desync runtime behavior vs user instructions.

Proposed refactor
+const DEFAULT_OPENSHELL_FALLBACK_VERSION = "0.0.44";
+
 function getOpenShellDockerSupervisorImage(versionOutput: string | null = null): string {
@@
-  const supportedVersion = installedVersion ?? getBlueprintMaxOpenshellVersion() ?? "0.0.44";
+  const supportedVersion =
+    installedVersion ?? getBlueprintMaxOpenshellVersion() ?? DEFAULT_OPENSHELL_FALLBACK_VERSION;
   return `ghcr.io/nvidia/openshell/supervisor:${supportedVersion}`;
 }
@@
-    console.error("  Install OpenShell v0.0.44, or set NEMOCLAW_OPENSHELL_GATEWAY_BIN.");
+    console.error(
+      `  Install OpenShell v${getBlueprintMaxOpenshellVersion() ?? DEFAULT_OPENSHELL_FALLBACK_VERSION}, or set NEMOCLAW_OPENSHELL_GATEWAY_BIN.`,
+    );

Also applies to: 4174-4174

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard.ts` around lines 2924 - 2925, The code duplicates the
OpenShell fallback version string ("0.0.44") in multiple places; define a single
shared constant (e.g., DEFAULT_OPENSHELL_VERSION) and use it as the final
fallback when computing supportedVersion (replace the inline "0.0.44" in the
installedVersion ?? getBlueprintMaxOpenshellVersion() ?? "0.0.44" expression)
and also use that same constant wherever install guidance or other logic emits
the hardcoded version (locations referencing getBlueprintMaxOpenshellVersion
fallback such as the compute of supportedVersion and the guidance at the other
occurrence around line 4174); update references to use the constant so future
bumps change only one symbol.
test/e2e/lib/slack-api-proof.sh (1)

157-539: Run the two Slack-focused nightly E2E jobs for this change (messaging-providers-e2e, openclaw-slack-pairing-e2e) to validate both OpenClaw-backed and hermetic proof paths after the fallback/root-resolution refactor.

As per coding guidelines, test/e2e/lib/slack-api-proof.sh: “E2E test recommendation: - messaging-providers-e2e ... - openclaw-slack-pairing-e2e ...”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/lib/slack-api-proof.sh` around lines 157 - 539, Trigger the two
Slack-focused nightly E2E jobs—messaging-providers-e2e and
openclaw-slack-pairing-e2e—to validate both the OpenClaw-backed path and the
hermetic fallback after the fallback/root-resolution refactor; run the jobs and
confirm the code paths exercised by run_fake_slack_channel_mention_proof (which
may call runOpenClawPrivateProof when resolveOpenClawRoot finds OpenClaw,
otherwise runHermeticSlackProof) both succeed and produce the expected proof
outputs.
test/e2e/test-openclaw-inference-switch.sh (1)

256-284: Given the switch-path assertion changed its OpenClaw invocation/parsing behavior, run openclaw-inference-switch-e2e to confirm route/config/live-request parity stays green.

As per coding guidelines, test/e2e/test-openclaw-inference-switch.sh: “E2E test recommendation: - openclaw-inference-switch-e2e ...”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-openclaw-inference-switch.sh` around lines 256 - 284, The test
changed OpenClaw invocation/parsing and needs an end-to-end run to verify
route/config/live-request parity; run the e2e suite named
openclaw-inference-switch-e2e and confirm the behavior in the
check_openclaw_agent_turn function still produces the expected numeric reply
(42) when invoking openclaw agent --json with the same session-id/command,
updating the test if the invocation/output format changed so python3
"${E2E_DIR}/lib/openclaw-agent-json.py" still parses the output correctly and
the grep check for 42 remains valid.
test/e2e/test-bedrock-runtime-compatible-anthropic.sh (1)

690-708: Given this parser-path change sits on the Bedrock-compatible agent-turn assertion, run bedrock-runtime-compatible-anthropic-e2e in nightly-e2e for regression confidence.

As per coding guidelines, test/e2e/test-bedrock-runtime-compatible-anthropic.sh: “E2E test recommendation: - bedrock-runtime-compatible-anthropic-e2e ...”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-bedrock-runtime-compatible-anthropic.sh` around lines 690 -
708, The Bedrock-compatible OpenClaw agent E2E (check_openclaw_agent_turn)
depends on the new parser-path change and needs nightly regression coverage; add
the `bedrock-runtime-compatible-anthropic-e2e` job to the nightly-e2e CI
pipeline (or include it in the nightly test matrix) so this test is executed
every night, ensuring the `check_openclaw_agent_turn` scenario is exercised
under the nightly run; make sure the job name exactly matches
`bedrock-runtime-compatible-anthropic-e2e` and that it runs the test file
`test/e2e/test-bedrock-runtime-compatible-anthropic.sh`.
test/e2e/test-messaging-compatible-endpoint.sh (1)

494-534: Run messaging-compatible-endpoint-e2e for this change, since it directly touches the Telegram + compatible-endpoint OpenClaw turn assertion path.

As per coding guidelines, test/e2e/test-messaging-compatible-endpoint.sh: “E2E test recommendation: - messaging-compatible-endpoint-e2e ...”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-messaging-compatible-endpoint.sh` around lines 494 - 534, The
change touches the OpenClaw turn path exercised by the check_openclaw_agent_turn
function; run the messaging-compatible-endpoint-e2e end-to-end suite (which
includes test/e2e/test-messaging-compatible-endpoint.sh) to validate the
Telegram + compatible-endpoint OpenClaw turn assertions and ensure
check_openclaw_agent_turn still passes (fix any failures revealed by that run).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js`:
- Around line 188-194: The split tool-call builder currently replaces the
original arguments with { command } causing schema drift; update
buildSplitToolCalls (and the other split-call sites referenced) to preserve the
original arguments shape by using the existing encodeToolCallArgumentsLike
helper: derive the new arguments by calling
encodeToolCallArgumentsLike(toolCall.arguments, command) (or the equivalent
helper used elsewhere) instead of hardcoding { command }, keep id creation via
buildSplitToolCallId and name "exec", and ensure message.content /
partial.content / event.toolCall retain the original argument schema across
split events.
- Around line 197-208: The function dedupeSafeExecToolCalls uses a single global
seenSafeExecCommands set which causes safe commands like hostname/date/uptime
from different tool invocations to be dropped; change the logic so the set is
cleared per original exec tool call rather than across the whole message: track
a per-call identifier (e.g. derive a call id from each block (use an existing
identifier on the block or add getExecToolCallId(block)), maintain currentCallId
and reset seenSafeExecCommands when the call id changes, and continue to use
SAFE_SPLIT_EXEC_COMMANDS and getExecToolCallCommand to decide which commands to
dedupe; update dedupeSafeExecToolCalls (and the other similar spots referenced)
to use this per-call dedupe behavior.

---

Outside diff comments:
In `@test/e2e/test-issue-2478-crash-loop-recovery.sh`:
- Around line 278-301: The wait_for_gateway_up function can exceed the requested
wall-clock timeout because gateway_runtime_ready (which invokes timeout 20) may
block per iteration; fix by bounding each loop iteration to the remaining time
and accounting for real elapsed time: compute remaining=$((timeout - elapsed))
at loop start, if remaining<=0 return 1, then call gateway_runtime_ready via the
timeout wrapper (e.g. timeout "$remaining" gateway_runtime_ready) or modify
gateway_runtime_ready to accept a per-call timeout parameter and pass
min(remaining,20); measure time before/after the call and increment elapsed by
the actual time spent (including sleep) so the while loop truly respects the
wall-clock timeout while still calling gateway_pid and gateway_runtime_ready.

---

Nitpick comments:
In `@src/lib/onboard.ts`:
- Around line 2924-2925: The code duplicates the OpenShell fallback version
string ("0.0.44") in multiple places; define a single shared constant (e.g.,
DEFAULT_OPENSHELL_VERSION) and use it as the final fallback when computing
supportedVersion (replace the inline "0.0.44" in the installedVersion ??
getBlueprintMaxOpenshellVersion() ?? "0.0.44" expression) and also use that same
constant wherever install guidance or other logic emits the hardcoded version
(locations referencing getBlueprintMaxOpenshellVersion fallback such as the
compute of supportedVersion and the guidance at the other occurrence around line
4174); update references to use the constant so future bumps change only one
symbol.

In `@test/e2e/lib/slack-api-proof.sh`:
- Around line 157-539: Trigger the two Slack-focused nightly E2E
jobs—messaging-providers-e2e and openclaw-slack-pairing-e2e—to validate both the
OpenClaw-backed path and the hermetic fallback after the
fallback/root-resolution refactor; run the jobs and confirm the code paths
exercised by run_fake_slack_channel_mention_proof (which may call
runOpenClawPrivateProof when resolveOpenClawRoot finds OpenClaw, otherwise
runHermeticSlackProof) both succeed and produce the expected proof outputs.

In `@test/e2e/test-bedrock-runtime-compatible-anthropic.sh`:
- Around line 690-708: The Bedrock-compatible OpenClaw agent E2E
(check_openclaw_agent_turn) depends on the new parser-path change and needs
nightly regression coverage; add the `bedrock-runtime-compatible-anthropic-e2e`
job to the nightly-e2e CI pipeline (or include it in the nightly test matrix) so
this test is executed every night, ensuring the `check_openclaw_agent_turn`
scenario is exercised under the nightly run; make sure the job name exactly
matches `bedrock-runtime-compatible-anthropic-e2e` and that it runs the test
file `test/e2e/test-bedrock-runtime-compatible-anthropic.sh`.

In `@test/e2e/test-messaging-compatible-endpoint.sh`:
- Around line 494-534: The change touches the OpenClaw turn path exercised by
the check_openclaw_agent_turn function; run the
messaging-compatible-endpoint-e2e end-to-end suite (which includes
test/e2e/test-messaging-compatible-endpoint.sh) to validate the Telegram +
compatible-endpoint OpenClaw turn assertions and ensure
check_openclaw_agent_turn still passes (fix any failures revealed by that run).

In `@test/e2e/test-openclaw-inference-switch.sh`:
- Around line 256-284: The test changed OpenClaw invocation/parsing and needs an
end-to-end run to verify route/config/live-request parity; run the e2e suite
named openclaw-inference-switch-e2e and confirm the behavior in the
check_openclaw_agent_turn function still produces the expected numeric reply
(42) when invoking openclaw agent --json with the same session-id/command,
updating the test if the invocation/output format changed so python3
"${E2E_DIR}/lib/openclaw-agent-json.py" still parses the output correctly and
the grep check for 42 remains valid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3fca2a0f-1dd0-4b82-888c-43a34c1e5e30

📥 Commits

Reviewing files that changed from the base of the PR and between e2c88b3 and cd41b2b.

📒 Files selected for processing (48)
  • .agents/skills/nemoclaw-user-reference/references/commands.md
  • .github/workflows/regression-e2e.yaml
  • Dockerfile
  • Dockerfile.base
  • agents/hermes/Dockerfile.base
  • agents/hermes/config/hermes-config.ts
  • agents/hermes/manifest.yaml
  • agents/openclaw/manifest.yaml
  • docs/reference/commands.mdx
  • nemoclaw-blueprint/blueprint.yaml
  • nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js
  • nemoclaw/package.json
  • nemoclaw/src/package-metadata.test.ts
  • scripts/brev-launchable-ci-cpu.sh
  • scripts/generate-openclaw-config.py
  • scripts/install-openshell.sh
  • scripts/patch-openclaw-tool-catalog.js
  • scripts/seed-wechat-accounts.py
  • src/ext/wechat/qr.ts
  • src/lib/onboard.ts
  • src/lib/onboard/docker-driver-gateway-runtime-marker.test.ts
  • src/lib/onboard/openshell-install.ts
  • src/lib/policy/index.ts
  • src/lib/sandbox/version.test.ts
  • src/lib/verify-deployment.test.ts
  • test/e2e-test.sh
  • test/e2e/docs/parity-inventory.generated.json
  • test/e2e/docs/parity-map.yaml
  • test/e2e/lib/openclaw-agent-json.py
  • test/e2e/lib/slack-api-proof.sh
  • test/e2e/test-bedrock-runtime-compatible-anthropic.sh
  • test/e2e/test-brave-search-e2e.sh
  • test/e2e/test-full-e2e.sh
  • test/e2e/test-issue-2478-crash-loop-recovery.sh
  • test/e2e/test-launchable-smoke.sh
  • test/e2e/test-messaging-compatible-endpoint.sh
  • test/e2e/test-openclaw-inference-switch.sh
  • test/e2e/test-openshell-gateway-upgrade.sh
  • test/e2e/test-openshell-version-pin.sh
  • test/e2e/test-sandbox-operations.sh
  • test/fetch-guard-patch-regression.test.ts
  • test/generate-openclaw-config.test.ts
  • test/install-openshell-version-check.test.ts
  • test/kimi-inference-compat-plugin.test.ts
  • test/openclaw-agent-json.test.ts
  • test/openclaw-tool-catalog-patch.test.ts
  • test/policies.test.ts
  • test/seed-wechat-accounts.test.ts

Comment on lines +188 to 194
function buildSplitToolCalls(toolCall, commands) {
return commands.map((command, index) => ({
type: "toolCall",
id: buildSplitToolCallId(toolCall.id, index, command),
name: "exec",
arguments: { command },
}));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the original arguments shape when cloning split tool calls.

encodeToolCallArgumentsLike() fixes event.delta, but every split tool call still gets { command } here. If the source block used stringified JSON arguments, message.content, partial.content, and event.toolCall silently change schema mid-stream on the same path.

Proposed fix
 function buildSplitToolCalls(toolCall, commands) {
   return commands.map((command, index) => ({
     type: "toolCall",
     id: buildSplitToolCallId(toolCall.id, index, command),
     name: "exec",
-    arguments: { command },
+    arguments: encodeToolCallArgumentsLike(toolCall.arguments, command),
   }));
 }

Also applies to: 219-219, 245-250, 309-309

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js` around
lines 188 - 194, The split tool-call builder currently replaces the original
arguments with { command } causing schema drift; update buildSplitToolCalls (and
the other split-call sites referenced) to preserve the original arguments shape
by using the existing encodeToolCallArgumentsLike helper: derive the new
arguments by calling encodeToolCallArgumentsLike(toolCall.arguments, command)
(or the equivalent helper used elsewhere) instead of hardcoding { command },
keep id creation via buildSplitToolCallId and name "exec", and ensure
message.content / partial.content / event.toolCall retain the original argument
schema across split events.

Comment on lines +197 to +208
function dedupeSafeExecToolCalls(content) {
const seenSafeExecCommands = new Set();
const deduped = [];
for (const block of content) {
const command = getExecToolCallCommand(block);
if (SAFE_SPLIT_EXEC_COMMANDS.has(command)) {
if (seenSafeExecCommands.has(command)) continue;
seenSafeExecCommands.add(command);
}
deduped.push(block);
}
return deduped;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't dedupe safe exec calls across the whole message.

This helper remembers hostname / date / uptime globally, so once one hostname survives, every later hostname block in the same message gets dropped—even if it came from a different tool call. That loses legitimate tool invocations instead of just cleaning up duplicates introduced by one split.

Proposed fix
 function rewriteSafeCombinedExecToolCallsInContent(content) {
   if (!Array.isArray(content)) return { changed: false, content };

   let changed = false;
   const expanded = [];
   for (const block of content) {
     const split = getSafeCombinedExecToolCallFromBlock(block);
     if (split) {
-      expanded.push(...buildSplitToolCalls(split.toolCall, split.commands));
+      expanded.push(...dedupeSafeExecToolCalls(buildSplitToolCalls(split.toolCall, split.commands)));
       changed = true;
     } else {
       expanded.push(block);
     }
   }
   if (!changed) return { changed: false, content };

-  return { changed: true, content: dedupeSafeExecToolCalls(expanded) };
+  return { changed: true, content: expanded };
 }

 function applySafeExecSplitAtContentIndex(message, split) {
   if (!message || typeof message !== "object" || !Array.isArray(message.content) || !split) {
     return false;
   }
   const index = Number.isInteger(split.contentIndex) ? split.contentIndex : 0;
   if (index < 0 || index >= message.content.length) return false;
-  const replacement = buildSplitToolCalls(split.toolCall, split.commands);
-  message.content = dedupeSafeExecToolCalls([
+  const replacement = dedupeSafeExecToolCalls(buildSplitToolCalls(split.toolCall, split.commands));
+  message.content = [
     ...message.content.slice(0, index),
     ...replacement,
     ...message.content.slice(index + 1),
-  ]);
+  ];
   if (message.stopReason === "stop") message.stopReason = "toolUse";
   return true;
 }

Also applies to: 227-227, 246-250

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js` around
lines 197 - 208, The function dedupeSafeExecToolCalls uses a single global
seenSafeExecCommands set which causes safe commands like hostname/date/uptime
from different tool invocations to be dropped; change the logic so the set is
cleared per original exec tool call rather than across the whole message: track
a per-call identifier (e.g. derive a call id from each block (use an existing
identifier on the block or add getExecToolCallId(block)), maintain currentCallId
and reset seenSafeExecCommands when the call id changes, and continue to use
SAFE_SPLIT_EXEC_COMMANDS and getExecToolCallCommand to decide which commands to
dedupe; update dedupeSafeExecToolCalls (and the other similar spots referenced)
to use this per-call dedupe behavior.

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 26188581699
Target ref: cd41b2b4f5ebf6758a62f4c3036c6817f244db11
Workflow ref: main
Requested jobs: cloud-e2e,cloud-onboard-e2e,openshell-gateway-upgrade-e2e,kimi-inference-compat-e2e,messaging-compatible-endpoint-e2e,openclaw-inference-switch-e2e,network-policy-e2e,sandbox-operations-e2e,launchable-smoke-e2e,hermes-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
cloud-e2e ⚠️ cancelled
cloud-onboard-e2e ❌ failure
hermes-e2e ⚠️ cancelled
kimi-inference-compat-e2e ⚠️ cancelled
launchable-smoke-e2e ⚠️ cancelled
messaging-compatible-endpoint-e2e ⚠️ cancelled
network-policy-e2e ⚠️ cancelled
openclaw-inference-switch-e2e ⚠️ cancelled
openshell-gateway-upgrade-e2e ⚠️ cancelled
sandbox-operations-e2e ⚠️ cancelled

Failed jobs: cloud-onboard-e2e. Check run artifacts for logs.


if (deltaSplit) {
if (!partialChanged) changed = applySafeExecSplitAtContentIndex(event.partial, deltaSplit) || changed;
if (!messageChanged) changed = applySafeExecSplitAtContentIndex(event.message, deltaSplit) || changed;
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26194607183
Target ref: 64efd118bbac9a4407bda405ebb54d0664432764
Workflow ref: main
Requested jobs: cloud-e2e,messaging-providers-e2e,channels-stop-start-e2e,sandbox-operations-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
channels-stop-start-e2e ⚠️ cancelled
cloud-e2e ⚠️ cancelled
messaging-providers-e2e ⚠️ cancelled
sandbox-operations-e2e ⚠️ cancelled

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 26194670119
Target ref: 64efd118bbac9a4407bda405ebb54d0664432764
Workflow ref: main
Requested jobs: openshell-gateway-upgrade-e2e,cloud-e2e,kimi-inference-compat-e2e,messaging-compatible-endpoint-e2e,bedrock-runtime-compatible-anthropic-e2e,openclaw-inference-switch-e2e,sandbox-operations-e2e,rebuild-openclaw-e2e,network-policy-e2e,credential-sanitization-e2e,hermes-e2e,launchable-smoke-e2e
Summary: 11 passed, 1 failed, 0 skipped

Job Result
bedrock-runtime-compatible-anthropic-e2e ✅ success
cloud-e2e ✅ success
credential-sanitization-e2e ✅ success
hermes-e2e ✅ success
kimi-inference-compat-e2e ✅ success
launchable-smoke-e2e ✅ success
messaging-compatible-endpoint-e2e ✅ success
network-policy-e2e ✅ success
openclaw-inference-switch-e2e ❌ failure
openshell-gateway-upgrade-e2e ✅ success
rebuild-openclaw-e2e ✅ success
sandbox-operations-e2e ✅ success

Failed jobs: openclaw-inference-switch-e2e. Check run artifacts for logs.

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26194838115
Target ref: 64efd118bbac9a4407bda405ebb54d0664432764
Workflow ref: main
Requested jobs: messaging-providers-e2e,channels-stop-start-e2e,channels-add-remove-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
channels-add-remove-e2e ✅ success
channels-stop-start-e2e ✅ success
messaging-providers-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26196765611
Target ref: fa28360ebd0df3542f51a62c99aa6e131e1e3e35
Workflow ref: main
Requested jobs: cloud-e2e,openshell-gateway-upgrade-e2e,launchable-smoke-e2e,sandbox-operations-e2e,rebuild-openclaw-e2e,upgrade-stale-sandbox-e2e,hermes-e2e,rebuild-hermes-e2e,kimi-inference-compat-e2e,messaging-compatible-endpoint-e2e,bedrock-runtime-compatible-anthropic-e2e,openclaw-inference-switch-e2e,network-policy-e2e,credential-sanitization-e2e
Summary: 14 passed, 0 failed, 0 skipped

Job Result
bedrock-runtime-compatible-anthropic-e2e ✅ success
cloud-e2e ✅ success
credential-sanitization-e2e ✅ success
hermes-e2e ✅ success
kimi-inference-compat-e2e ✅ success
launchable-smoke-e2e ✅ success
messaging-compatible-endpoint-e2e ✅ success
network-policy-e2e ✅ success
openclaw-inference-switch-e2e ✅ success
openshell-gateway-upgrade-e2e ✅ success
rebuild-hermes-e2e ✅ success
rebuild-openclaw-e2e ✅ success
sandbox-operations-e2e ✅ success
upgrade-stale-sandbox-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 26196857472
Target ref: fa28360ebd0df3542f51a62c99aa6e131e1e3e35
Workflow ref: main
Requested jobs: all (no filter)
Summary: 45 passed, 2 failed, 2 skipped

Job Result
bedrock-runtime-compatible-anthropic-e2e ✅ success
brave-search-e2e ✅ success
channels-add-remove-e2e ✅ success
channels-stop-start-e2e ✅ success
cloud-e2e ❌ failure
cloud-inference-e2e ✅ success
cloud-onboard-e2e ✅ success
credential-migration-e2e ✅ success
credential-sanitization-e2e ✅ success
device-auth-health-e2e ✅ success
diagnostics-e2e ✅ success
docs-validation-e2e ✅ success
double-onboard-e2e ✅ success
gpu-double-onboard-e2e ⏭️ skipped
gpu-e2e ⏭️ skipped
hermes-discord-e2e ✅ success
hermes-e2e ✅ success
hermes-inference-switch-e2e ✅ success
hermes-onboard-security-posture-e2e ✅ success
hermes-slack-e2e ✅ success
inference-routing-e2e ✅ success
issue-2478-crash-loop-recovery-e2e ✅ success
kimi-inference-compat-e2e ✅ success
launchable-smoke-e2e ❌ failure
messaging-compatible-endpoint-e2e ✅ success
messaging-providers-e2e ✅ success
network-policy-e2e ✅ success
onboard-negative-paths-e2e ✅ success
onboard-repair-e2e ✅ success
onboard-resume-e2e ✅ success
openclaw-inference-switch-e2e ✅ success
openclaw-onboard-security-posture-e2e ✅ success
openclaw-slack-pairing-e2e ✅ success
openshell-gateway-upgrade-e2e ✅ success
overlayfs-autofix-e2e ✅ success
rebuild-hermes-e2e ✅ success
rebuild-hermes-stale-base-e2e ✅ success
rebuild-openclaw-e2e ✅ success
runtime-overrides-e2e ✅ success
sandbox-operations-e2e ✅ success
sandbox-survival-e2e ✅ success
shields-config-e2e ✅ success
skill-agent-e2e ✅ success
snapshot-commands-e2e ✅ success
state-backup-restore-e2e ✅ success
telegram-injection-e2e ✅ success
token-rotation-e2e ✅ success
tunnel-lifecycle-e2e ✅ success
upgrade-stale-sandbox-e2e ✅ success

Failed jobs: cloud-e2e, launchable-smoke-e2e. Check run artifacts for logs.

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26198339046
Target ref: fa28360ebd0df3542f51a62c99aa6e131e1e3e35
Workflow ref: main
Requested jobs: cloud-e2e,launchable-smoke-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
cloud-e2e ✅ success
launchable-smoke-e2e ✅ success

@wangericnv
Copy link
Copy Markdown

QA test report — DGX Spark + DGX Station ARM64

TL;DR: Image-managed extension preservation (T6047546) PASS across 3/3 sandboxes on 2/2 ARM64 boxes, but a separate post-rebuild gateway-init regression reproduces consistently in this PR vs main — filed as NVBug #6198894 / linked here as #3975.

What was tested

Box Sandbox Pre-PR Post-PR rebuild T6047546 (ext preservation)
DGX Spark (aarch64) my-assistant NemoClaw v0.0.46, OpenClaw 2026.4.24 OpenClaw 2026.5.18 ✓ nemoclaw/dist/index.js sha256 unchanged
DGX Station GB300 (aarch64) discord-sb OpenClaw 2026.4.24 OpenClaw 2026.5.18 ✓ ✅ same sha256 preserved
DGX Station GB300 (aarch64) gemini-sb OpenClaw 2026.4.24 OpenClaw 2026.5.18 ✓ ✅ same sha256 preserved

N1X (Snapdragon X ARM64 WSL2) test is in-progress — will follow-up in this thread once complete.

The regression — #3975 / NVB#6198894

Post-rebuild the in-sandbox OpenClaw gateway never starts. All 3 sandboxes above end up in Up X min (unhealthy) Docker state, and the rebuild's own deployment-verify reports:

✗ gateway: HTTP 0 (gateway not responding)
✗ dashboard: port forward not working (connection refused)

The only line that ever appears in /tmp/gateway.log inside the container is:

[gateway-recovery] WARNING: /tmp/nemoclaw-proxy-env.sh missing - gateway launching without library guards (#2478)

No [gateway] ready, no plugin register, no HTTP server start. The sandbox is functionally unusable post-rebuild.

Negative control on same Spark box

Installed main HEAD (cfa817b, still bundles OpenClaw 2026.4.24) on top of the PR-installed v0.1.0 → rebuild → result is healthy:

✓ Deployment verified — gateway and dashboard are healthy.
  OpenClaw version: 2026.4.24

Gateway log contains the full init sequence and [gateway] ready (4 plugins: browser, device-pair, phone-control, talk-voice; 1.6s).

So the regression is specifically on the PR's OpenClaw 2026.5.18 path, not pre-existing on main. Suspect the PR drops the nemoclaw-proxy-env.sh setup step that the gateway-recovery path now warns about (referencing closed #2478).

Other observations (not blocking)

  • After rebuild, sandboxes.json keeps openshellVersion: "0.0.39" even though the PR pins openshell 0.0.44. Metadata drift only — image itself is new.
  • nemoclaw plugin register on main rebuild also throws SyntaxError: Unexpected end of JSON input from /sandbox/.openclaw/extensions/nemoclaw/dist/index.jspre-existing on main, not introduced by this PR. Worth a separate look later.

DevTest evidence

  • 597724 (Spark)
  • 597725 (Station discord-sb)
  • 597726 (Station gemini-sb)

Under \DevTest\NemoClaw\NemoClaw Test\v0.0.47\manual\{spark,station,n1x}.


Recommendation: investigate why OpenClaw 2026.5.18 doesn't bring up /tmp/nemoclaw-proxy-env.sh on aarch64. PR should not merge until #3975 is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Use this label to identify issues with NemoClaw CI/CD pipeline or GitHub Actions. dependencies Pull requests that update a dependency file E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps Integration: Hermes Integration: OpenClaw Support for OpenClaw OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents v0.0.49 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants