chore: upgrade agent runtime dependencies#3925
Conversation
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (2)
📝 WalkthroughWalkthroughThis PR systematically upgrades OpenClaw (2026.4.24→2026.5.18), OpenShell (0.0.39→0.0.44), Hermes (2026.4.23→2026.5.16), and WeChat plugin (2.4.2→2.4.3), while introducing enhanced WeChat metadata discovery with dynamic channel registration, refactoring Kimi inference plugin exec-call splitting with content-index awareness, improving E2E test infrastructure with a shared JSON parser utility, and strengthening sandbox state restoration with image-managed extension preservation. ChangesVersion upgrade and plugin integration enhancement
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
|
🌿 Preview your docs: https://nvidia-preview-pr-3925.docs.buildwithfern.com/nemoclaw |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
PR Review AdvisorRecommendation: blocked This is an automated advisory review. A human maintainer must make the final merge decision. Limitations: The supplied diff was truncated at 160000 characters; not every changed line in all 51 files was independently reviewed in full.; No live command execution, package-manager install, or test execution was performed; results are based on trusted supplied GitHub/diff context and read-only review.; Dependency CVE/license status for newly pinned upstream versions was not available in the supplied context.; E2E Advisor comments were available, but supplied E2E result comments do not demonstrate all required jobs passing for exact head SHA fa28360.; No linked issues were supplied, so linked-issue acceptance could not be mapped beyond PR-body, E2E Advisor, and review-comment clauses.; CodeRabbit and GitHub Advanced Security review comments were present and unresolved; final maintainer disposition was not available.; Open PR overlap and drift evidence were reviewed from supplied metadata, not by checking out and comparing every overlapping branch. Full advisor summaryPR Review AdvisorBase: Blocked: current-head CI is pending, mergeability is BLOCKED, 4 review threads remain unresolved, and required sandbox/runtime/security E2E has not passed for exact head SHA fa28360. Gate status
🔴 Blockers
🟡 Warnings
🔵 Suggestions
Acceptance coverage
Security review
Test / E2E status
✅ What looks good
Review completeness
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/test-issue-2478-crash-loop-recovery.sh (1)
278-301:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
wait_for_gateway_uptimeout is not wall-clock bounded.Line 278 can block up to 20s per poll, but Line 301 only increments
elapsedby 2s. Sowait_for_gateway_up 45can run for several minutes, making E2E timing behavior misleading and increasing hang risk.Suggested fix
gateway_runtime_ready() { + local status_timeout="${1:-5}" if sandbox_exec sh -c "grep -Fq '[gateway] ready' /tmp/gateway.log 2>/dev/null"; then return 0 fi @@ local status_output - status_output="$(timeout 20 nemoclaw "$SANDBOX_NAME" status 2>&1)" || true + status_output="$(timeout "$status_timeout" nemoclaw "$SANDBOX_NAME" status 2>&1)" || true @@ wait_for_gateway_up() { local timeout="${1:-30}" - local elapsed=0 pid="" - while [ "$elapsed" -lt "$timeout" ]; do + local pid="" + local deadline=$((SECONDS + timeout)) + while [ "$SECONDS" -lt "$deadline" ]; do pid="$(gateway_pid)" - if [ -n "$pid" ] && gateway_runtime_ready; then + if [ -n "$pid" ] && gateway_runtime_ready 5; then echo "$pid" return 0 fi sleep 2 - elapsed=$((elapsed + 2)) done echo "" return 1 }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-issue-2478-crash-loop-recovery.sh` around lines 278 - 301, The wait_for_gateway_up function can exceed the requested wall-clock timeout because gateway_runtime_ready (which invokes timeout 20) may block per iteration; fix by bounding each loop iteration to the remaining time and accounting for real elapsed time: compute remaining=$((timeout - elapsed)) at loop start, if remaining<=0 return 1, then call gateway_runtime_ready via the timeout wrapper (e.g. timeout "$remaining" gateway_runtime_ready) or modify gateway_runtime_ready to accept a per-call timeout parameter and pass min(remaining,20); measure time before/after the call and increment elapsed by the actual time spent (including sleep) so the while loop truly respects the wall-clock timeout while still calling gateway_pid and gateway_runtime_ready.
🧹 Nitpick comments (5)
src/lib/onboard.ts (1)
2924-2925: ⚡ Quick winAvoid duplicating the OpenShell fallback version string in multiple places.
"0.0.44"is now hardcoded in both fallback selection and install guidance. Please derive this from one shared value (e.g., blueprint max with a single constant fallback) so future bumps can’t desync runtime behavior vs user instructions.Proposed refactor
+const DEFAULT_OPENSHELL_FALLBACK_VERSION = "0.0.44"; + function getOpenShellDockerSupervisorImage(versionOutput: string | null = null): string { @@ - const supportedVersion = installedVersion ?? getBlueprintMaxOpenshellVersion() ?? "0.0.44"; + const supportedVersion = + installedVersion ?? getBlueprintMaxOpenshellVersion() ?? DEFAULT_OPENSHELL_FALLBACK_VERSION; return `ghcr.io/nvidia/openshell/supervisor:${supportedVersion}`; } @@ - console.error(" Install OpenShell v0.0.44, or set NEMOCLAW_OPENSHELL_GATEWAY_BIN."); + console.error( + ` Install OpenShell v${getBlueprintMaxOpenshellVersion() ?? DEFAULT_OPENSHELL_FALLBACK_VERSION}, or set NEMOCLAW_OPENSHELL_GATEWAY_BIN.`, + );Also applies to: 4174-4174
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard.ts` around lines 2924 - 2925, The code duplicates the OpenShell fallback version string ("0.0.44") in multiple places; define a single shared constant (e.g., DEFAULT_OPENSHELL_VERSION) and use it as the final fallback when computing supportedVersion (replace the inline "0.0.44" in the installedVersion ?? getBlueprintMaxOpenshellVersion() ?? "0.0.44" expression) and also use that same constant wherever install guidance or other logic emits the hardcoded version (locations referencing getBlueprintMaxOpenshellVersion fallback such as the compute of supportedVersion and the guidance at the other occurrence around line 4174); update references to use the constant so future bumps change only one symbol.test/e2e/lib/slack-api-proof.sh (1)
157-539: Run the two Slack-focused nightly E2E jobs for this change (messaging-providers-e2e,openclaw-slack-pairing-e2e) to validate both OpenClaw-backed and hermetic proof paths after the fallback/root-resolution refactor.As per coding guidelines,
test/e2e/lib/slack-api-proof.sh: “E2E test recommendation: -messaging-providers-e2e... -openclaw-slack-pairing-e2e...”.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/lib/slack-api-proof.sh` around lines 157 - 539, Trigger the two Slack-focused nightly E2E jobs—messaging-providers-e2e and openclaw-slack-pairing-e2e—to validate both the OpenClaw-backed path and the hermetic fallback after the fallback/root-resolution refactor; run the jobs and confirm the code paths exercised by run_fake_slack_channel_mention_proof (which may call runOpenClawPrivateProof when resolveOpenClawRoot finds OpenClaw, otherwise runHermeticSlackProof) both succeed and produce the expected proof outputs.test/e2e/test-openclaw-inference-switch.sh (1)
256-284: Given the switch-path assertion changed its OpenClaw invocation/parsing behavior, runopenclaw-inference-switch-e2eto confirm route/config/live-request parity stays green.As per coding guidelines,
test/e2e/test-openclaw-inference-switch.sh: “E2E test recommendation: -openclaw-inference-switch-e2e...”.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-openclaw-inference-switch.sh` around lines 256 - 284, The test changed OpenClaw invocation/parsing and needs an end-to-end run to verify route/config/live-request parity; run the e2e suite named openclaw-inference-switch-e2e and confirm the behavior in the check_openclaw_agent_turn function still produces the expected numeric reply (42) when invoking openclaw agent --json with the same session-id/command, updating the test if the invocation/output format changed so python3 "${E2E_DIR}/lib/openclaw-agent-json.py" still parses the output correctly and the grep check for 42 remains valid.test/e2e/test-bedrock-runtime-compatible-anthropic.sh (1)
690-708: Given this parser-path change sits on the Bedrock-compatible agent-turn assertion, runbedrock-runtime-compatible-anthropic-e2ein nightly-e2e for regression confidence.As per coding guidelines,
test/e2e/test-bedrock-runtime-compatible-anthropic.sh: “E2E test recommendation: -bedrock-runtime-compatible-anthropic-e2e...”.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-bedrock-runtime-compatible-anthropic.sh` around lines 690 - 708, The Bedrock-compatible OpenClaw agent E2E (check_openclaw_agent_turn) depends on the new parser-path change and needs nightly regression coverage; add the `bedrock-runtime-compatible-anthropic-e2e` job to the nightly-e2e CI pipeline (or include it in the nightly test matrix) so this test is executed every night, ensuring the `check_openclaw_agent_turn` scenario is exercised under the nightly run; make sure the job name exactly matches `bedrock-runtime-compatible-anthropic-e2e` and that it runs the test file `test/e2e/test-bedrock-runtime-compatible-anthropic.sh`.test/e2e/test-messaging-compatible-endpoint.sh (1)
494-534: Runmessaging-compatible-endpoint-e2efor this change, since it directly touches the Telegram + compatible-endpoint OpenClaw turn assertion path.As per coding guidelines,
test/e2e/test-messaging-compatible-endpoint.sh: “E2E test recommendation: -messaging-compatible-endpoint-e2e...”.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-messaging-compatible-endpoint.sh` around lines 494 - 534, The change touches the OpenClaw turn path exercised by the check_openclaw_agent_turn function; run the messaging-compatible-endpoint-e2e end-to-end suite (which includes test/e2e/test-messaging-compatible-endpoint.sh) to validate the Telegram + compatible-endpoint OpenClaw turn assertions and ensure check_openclaw_agent_turn still passes (fix any failures revealed by that run).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js`:
- Around line 188-194: The split tool-call builder currently replaces the
original arguments with { command } causing schema drift; update
buildSplitToolCalls (and the other split-call sites referenced) to preserve the
original arguments shape by using the existing encodeToolCallArgumentsLike
helper: derive the new arguments by calling
encodeToolCallArgumentsLike(toolCall.arguments, command) (or the equivalent
helper used elsewhere) instead of hardcoding { command }, keep id creation via
buildSplitToolCallId and name "exec", and ensure message.content /
partial.content / event.toolCall retain the original argument schema across
split events.
- Around line 197-208: The function dedupeSafeExecToolCalls uses a single global
seenSafeExecCommands set which causes safe commands like hostname/date/uptime
from different tool invocations to be dropped; change the logic so the set is
cleared per original exec tool call rather than across the whole message: track
a per-call identifier (e.g. derive a call id from each block (use an existing
identifier on the block or add getExecToolCallId(block)), maintain currentCallId
and reset seenSafeExecCommands when the call id changes, and continue to use
SAFE_SPLIT_EXEC_COMMANDS and getExecToolCallCommand to decide which commands to
dedupe; update dedupeSafeExecToolCalls (and the other similar spots referenced)
to use this per-call dedupe behavior.
---
Outside diff comments:
In `@test/e2e/test-issue-2478-crash-loop-recovery.sh`:
- Around line 278-301: The wait_for_gateway_up function can exceed the requested
wall-clock timeout because gateway_runtime_ready (which invokes timeout 20) may
block per iteration; fix by bounding each loop iteration to the remaining time
and accounting for real elapsed time: compute remaining=$((timeout - elapsed))
at loop start, if remaining<=0 return 1, then call gateway_runtime_ready via the
timeout wrapper (e.g. timeout "$remaining" gateway_runtime_ready) or modify
gateway_runtime_ready to accept a per-call timeout parameter and pass
min(remaining,20); measure time before/after the call and increment elapsed by
the actual time spent (including sleep) so the while loop truly respects the
wall-clock timeout while still calling gateway_pid and gateway_runtime_ready.
---
Nitpick comments:
In `@src/lib/onboard.ts`:
- Around line 2924-2925: The code duplicates the OpenShell fallback version
string ("0.0.44") in multiple places; define a single shared constant (e.g.,
DEFAULT_OPENSHELL_VERSION) and use it as the final fallback when computing
supportedVersion (replace the inline "0.0.44" in the installedVersion ??
getBlueprintMaxOpenshellVersion() ?? "0.0.44" expression) and also use that same
constant wherever install guidance or other logic emits the hardcoded version
(locations referencing getBlueprintMaxOpenshellVersion fallback such as the
compute of supportedVersion and the guidance at the other occurrence around line
4174); update references to use the constant so future bumps change only one
symbol.
In `@test/e2e/lib/slack-api-proof.sh`:
- Around line 157-539: Trigger the two Slack-focused nightly E2E
jobs—messaging-providers-e2e and openclaw-slack-pairing-e2e—to validate both the
OpenClaw-backed path and the hermetic fallback after the
fallback/root-resolution refactor; run the jobs and confirm the code paths
exercised by run_fake_slack_channel_mention_proof (which may call
runOpenClawPrivateProof when resolveOpenClawRoot finds OpenClaw, otherwise
runHermeticSlackProof) both succeed and produce the expected proof outputs.
In `@test/e2e/test-bedrock-runtime-compatible-anthropic.sh`:
- Around line 690-708: The Bedrock-compatible OpenClaw agent E2E
(check_openclaw_agent_turn) depends on the new parser-path change and needs
nightly regression coverage; add the `bedrock-runtime-compatible-anthropic-e2e`
job to the nightly-e2e CI pipeline (or include it in the nightly test matrix) so
this test is executed every night, ensuring the `check_openclaw_agent_turn`
scenario is exercised under the nightly run; make sure the job name exactly
matches `bedrock-runtime-compatible-anthropic-e2e` and that it runs the test
file `test/e2e/test-bedrock-runtime-compatible-anthropic.sh`.
In `@test/e2e/test-messaging-compatible-endpoint.sh`:
- Around line 494-534: The change touches the OpenClaw turn path exercised by
the check_openclaw_agent_turn function; run the
messaging-compatible-endpoint-e2e end-to-end suite (which includes
test/e2e/test-messaging-compatible-endpoint.sh) to validate the Telegram +
compatible-endpoint OpenClaw turn assertions and ensure
check_openclaw_agent_turn still passes (fix any failures revealed by that run).
In `@test/e2e/test-openclaw-inference-switch.sh`:
- Around line 256-284: The test changed OpenClaw invocation/parsing and needs an
end-to-end run to verify route/config/live-request parity; run the e2e suite
named openclaw-inference-switch-e2e and confirm the behavior in the
check_openclaw_agent_turn function still produces the expected numeric reply
(42) when invoking openclaw agent --json with the same session-id/command,
updating the test if the invocation/output format changed so python3
"${E2E_DIR}/lib/openclaw-agent-json.py" still parses the output correctly and
the grep check for 42 remains valid.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3fca2a0f-1dd0-4b82-888c-43a34c1e5e30
📒 Files selected for processing (48)
.agents/skills/nemoclaw-user-reference/references/commands.md.github/workflows/regression-e2e.yamlDockerfileDockerfile.baseagents/hermes/Dockerfile.baseagents/hermes/config/hermes-config.tsagents/hermes/manifest.yamlagents/openclaw/manifest.yamldocs/reference/commands.mdxnemoclaw-blueprint/blueprint.yamlnemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.jsnemoclaw/package.jsonnemoclaw/src/package-metadata.test.tsscripts/brev-launchable-ci-cpu.shscripts/generate-openclaw-config.pyscripts/install-openshell.shscripts/patch-openclaw-tool-catalog.jsscripts/seed-wechat-accounts.pysrc/ext/wechat/qr.tssrc/lib/onboard.tssrc/lib/onboard/docker-driver-gateway-runtime-marker.test.tssrc/lib/onboard/openshell-install.tssrc/lib/policy/index.tssrc/lib/sandbox/version.test.tssrc/lib/verify-deployment.test.tstest/e2e-test.shtest/e2e/docs/parity-inventory.generated.jsontest/e2e/docs/parity-map.yamltest/e2e/lib/openclaw-agent-json.pytest/e2e/lib/slack-api-proof.shtest/e2e/test-bedrock-runtime-compatible-anthropic.shtest/e2e/test-brave-search-e2e.shtest/e2e/test-full-e2e.shtest/e2e/test-issue-2478-crash-loop-recovery.shtest/e2e/test-launchable-smoke.shtest/e2e/test-messaging-compatible-endpoint.shtest/e2e/test-openclaw-inference-switch.shtest/e2e/test-openshell-gateway-upgrade.shtest/e2e/test-openshell-version-pin.shtest/e2e/test-sandbox-operations.shtest/fetch-guard-patch-regression.test.tstest/generate-openclaw-config.test.tstest/install-openshell-version-check.test.tstest/kimi-inference-compat-plugin.test.tstest/openclaw-agent-json.test.tstest/openclaw-tool-catalog-patch.test.tstest/policies.test.tstest/seed-wechat-accounts.test.ts
| function buildSplitToolCalls(toolCall, commands) { | ||
| return commands.map((command, index) => ({ | ||
| type: "toolCall", | ||
| id: buildSplitToolCallId(toolCall.id, index, command), | ||
| name: "exec", | ||
| arguments: { command }, | ||
| })); |
There was a problem hiding this comment.
Preserve the original arguments shape when cloning split tool calls.
encodeToolCallArgumentsLike() fixes event.delta, but every split tool call still gets { command } here. If the source block used stringified JSON arguments, message.content, partial.content, and event.toolCall silently change schema mid-stream on the same path.
Proposed fix
function buildSplitToolCalls(toolCall, commands) {
return commands.map((command, index) => ({
type: "toolCall",
id: buildSplitToolCallId(toolCall.id, index, command),
name: "exec",
- arguments: { command },
+ arguments: encodeToolCallArgumentsLike(toolCall.arguments, command),
}));
}Also applies to: 219-219, 245-250, 309-309
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js` around
lines 188 - 194, The split tool-call builder currently replaces the original
arguments with { command } causing schema drift; update buildSplitToolCalls (and
the other split-call sites referenced) to preserve the original arguments shape
by using the existing encodeToolCallArgumentsLike helper: derive the new
arguments by calling encodeToolCallArgumentsLike(toolCall.arguments, command)
(or the equivalent helper used elsewhere) instead of hardcoding { command },
keep id creation via buildSplitToolCallId and name "exec", and ensure
message.content / partial.content / event.toolCall retain the original argument
schema across split events.
| function dedupeSafeExecToolCalls(content) { | ||
| const seenSafeExecCommands = new Set(); | ||
| const deduped = []; | ||
| for (const block of content) { | ||
| const command = getExecToolCallCommand(block); | ||
| if (SAFE_SPLIT_EXEC_COMMANDS.has(command)) { | ||
| if (seenSafeExecCommands.has(command)) continue; | ||
| seenSafeExecCommands.add(command); | ||
| } | ||
| deduped.push(block); | ||
| } | ||
| return deduped; |
There was a problem hiding this comment.
Don't dedupe safe exec calls across the whole message.
This helper remembers hostname / date / uptime globally, so once one hostname survives, every later hostname block in the same message gets dropped—even if it came from a different tool call. That loses legitimate tool invocations instead of just cleaning up duplicates introduced by one split.
Proposed fix
function rewriteSafeCombinedExecToolCallsInContent(content) {
if (!Array.isArray(content)) return { changed: false, content };
let changed = false;
const expanded = [];
for (const block of content) {
const split = getSafeCombinedExecToolCallFromBlock(block);
if (split) {
- expanded.push(...buildSplitToolCalls(split.toolCall, split.commands));
+ expanded.push(...dedupeSafeExecToolCalls(buildSplitToolCalls(split.toolCall, split.commands)));
changed = true;
} else {
expanded.push(block);
}
}
if (!changed) return { changed: false, content };
- return { changed: true, content: dedupeSafeExecToolCalls(expanded) };
+ return { changed: true, content: expanded };
}
function applySafeExecSplitAtContentIndex(message, split) {
if (!message || typeof message !== "object" || !Array.isArray(message.content) || !split) {
return false;
}
const index = Number.isInteger(split.contentIndex) ? split.contentIndex : 0;
if (index < 0 || index >= message.content.length) return false;
- const replacement = buildSplitToolCalls(split.toolCall, split.commands);
- message.content = dedupeSafeExecToolCalls([
+ const replacement = dedupeSafeExecToolCalls(buildSplitToolCalls(split.toolCall, split.commands));
+ message.content = [
...message.content.slice(0, index),
...replacement,
...message.content.slice(index + 1),
- ]);
+ ];
if (message.stopReason === "stop") message.stopReason = "toolUse";
return true;
}Also applies to: 227-227, 246-250
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@nemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.js` around
lines 197 - 208, The function dedupeSafeExecToolCalls uses a single global
seenSafeExecCommands set which causes safe commands like hostname/date/uptime
from different tool invocations to be dropped; change the logic so the set is
cleared per original exec tool call rather than across the whole message: track
a per-call identifier (e.g. derive a call id from each block (use an existing
identifier on the block or add getExecToolCallId(block)), maintain currentCallId
and reset seenSafeExecCommands when the call id changes, and continue to use
SAFE_SPLIT_EXEC_COMMANDS and getExecToolCallCommand to decide which commands to
dedupe; update dedupeSafeExecToolCalls (and the other similar spots referenced)
to use this per-call dedupe behavior.
Selective E2E Results — ❌ Some jobs failedRun: 26188581699
|
Selective E2E Results — ✅ All requested jobs passedRun: 26194607183
|
Selective E2E Results — ❌ Some jobs failedRun: 26194670119
|
Selective E2E Results — ✅ All requested jobs passedRun: 26194838115
|
Selective E2E Results — ✅ All requested jobs passedRun: 26196765611
|
Selective E2E Results — ❌ Some jobs failedRun: 26196857472
|
Selective E2E Results — ✅ All requested jobs passedRun: 26198339046
|
QA test report — DGX Spark + DGX Station ARM64TL;DR: Image-managed extension preservation (T6047546) PASS across 3/3 sandboxes on 2/2 ARM64 boxes, but a separate post-rebuild gateway-init regression reproduces consistently in this PR vs What was tested
N1X (Snapdragon X ARM64 WSL2) test is in-progress — will follow-up in this thread once complete. The regression — #3975 / NVB#6198894Post-rebuild the in-sandbox OpenClaw gateway never starts. All 3 sandboxes above end up in The only line that ever appears in No Negative control on same Spark boxInstalled Gateway log contains the full init sequence and So the regression is specifically on the PR's OpenClaw 2026.5.18 path, not pre-existing on Other observations (not blocking)
DevTest evidence
Under Recommendation: investigate why OpenClaw 2026.5.18 doesn't bring up |
Summary
Verification
Links
Do not merge until release approval is explicit.
Summary by CodeRabbit
Chores
New Features
Improvements
Tests
Documentation