Skip to content

fix(e2e): use sandbox subcommands in scenario suites#3927

Open
jyaunches wants to merge 1 commit into
mainfrom
fix/baseline-sandbox-commands
Open

fix(e2e): use sandbox subcommands in scenario suites#3927
jyaunches wants to merge 1 commit into
mainfrom
fix/baseline-sandbox-commands

Conversation

@jyaunches
Copy link
Copy Markdown
Contributor

@jyaunches jyaunches commented May 20, 2026

Summary

  • update baseline onboarding scenario assertions to call nemoclaw sandbox status/logs <name>
  • update sandbox lifecycle scenario assertions to use the same current sandbox subcommands
  • adjust mocked scenario-framework test coverage for the current CLI shape

Root cause

Nightly E2E run 26185198123 failed onboard-negative-paths-e2e because the new baseline scenario suite called legacy-style nemoclaw status <sandbox>. The current global status command no longer accepts a sandbox name and reports Unexpected argument; sandbox-specific status/logs live under nemoclaw sandbox ....

Validation

  • git diff --check
  • npm test -- --run test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts test/e2e/scenario-framework-tests/e2e-suite-runner.test.ts

@github-actions
Copy link
Copy Markdown
Contributor

PR Review Advisor

Recommendation: blocked
Confidence: medium
Analyzed HEAD: cdfaa5262f2678217f0e0e6cc4d25abec086f086
Findings: 1 blocker(s), 2 warning(s), 0 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: Review used trusted deterministic PR metadata and read-only file inspection only; no commands, tests, package-manager operations, or PR scripts were executed.; No linked issues or issue comments were available, so acceptance mapping is based on PR-body clauses and diff evidence rather than linked-issue acceptance criteria.; CI and E2E checks were still pending in the provided metadata, so final pass/fail signal for the latest head SHA was unavailable.; Overlapping PRs were identified from trusted metadata but their full diffs were not inspected in this review.

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: cdfaa5262f2678217f0e0e6cc4d25abec086f086
Recommendation: blocked
Confidence: medium

Patch is a narrow E2E command-shape fix and appears correct, but merge is currently blocked by pending CI/mergeStateStatus and overlapping active PRs touching the same files.

Gate status

  • CI: pending — Trusted metadata reports 12 pending status contexts for head SHA cdfaa52, including E2E recommendation, wsl-e2e, macos-e2e, PR review advisor, CodeQL, ShellCheck SARIF, unit-vitest-linux, checks, and sandbox image builds.
  • Mergeability: fail — GitHub GraphQL reports mergeStateStatus=BLOCKED and reviewDecision=REVIEW_REQUIRED for PR fix(e2e): use sandbox subcommands in scenario suites #3927.
  • Review threads: pass — GraphQL reviewThreads.nodes is empty; trusted gate summary also notes no review thread state was available.
  • Risky code tested: pass — Trusted path heuristics detected no risky runtime/security code areas; changed files are E2E test/helper scripts only.

🔴 Blockers

  • Merge is blocked until required checks complete: The latest head SHA has multiple in-progress or queued required/relevant checks, and GitHub reports the PR merge state as blocked.
    • Recommendation: Wait for all required checks, especially ShellCheck, unit-vitest-linux, E2E recommendation, wsl-e2e, and macos-e2e, to complete successfully for cdfaa52 before considering merge.
    • Evidence: statusCheckRollup includes IN_PROGRESS/QUEUED checks; mergeStateStatus=BLOCKED.

🟡 Warnings

🔵 Suggestions

  • None.

Acceptance coverage

  • met — update baseline onboarding scenario assertions to call nemoclaw sandbox status/logs <name>: baseline_onboarding.sh now calls nemoclaw sandbox status "$E2E_SANDBOX_NAME" and nemoclaw sandbox logs "$E2E_SANDBOX_NAME".
  • met — update sandbox lifecycle scenario assertions to use the same current sandbox subcommands: sandbox_lifecycle.sh now calls nemoclaw sandbox status "${E2E_SANDBOX_NAME}" and nemoclaw sandbox logs "${E2E_SANDBOX_NAME}".
  • met — adjust mocked scenario-framework test coverage for the current CLI shape: e2e-lib-helpers.test.ts mock nemoclaw script now recognizes $1 $2 == "sandbox status" and $1 $2 == "sandbox logs".
  • partial — Nightly E2E run 26185198123 failed onboard-negative-paths-e2e because the new baseline scenario suite called legacy-style nemoclaw status <sandbox>. The current global status command no longer accepts a sandbox name and reports Unexpected argument; sandbox-specific status/logs live under nemoclaw sandbox ....: The diff updates the affected baseline and sandbox lifecycle calls from legacy nemoclaw status/logs <sandbox> to nemoclaw sandbox status/logs <sandbox>. The cited nightly failure itself is PR-body evidence only and was not independently verified here.
  • unknowngit diff --check: Claim appears in PR body validation, but no trusted result for this command was provided in the deterministic context.
  • unknownnpm test -- --run test/e2e/scenario-framework-tests/e2e-lib-helpers.test.ts test/e2e/scenario-framework-tests/e2e-suite-runner.test.ts: Claim appears in PR body validation, but no trusted test result for this command was provided. CI unit-vitest-linux is queued/pending.

Security review

  • pass — 1. Secrets and Credentials: No hardcoded secrets, tokens, keys, credential files, or connection strings are introduced. The patch only changes E2E command invocations and a test mock.
  • pass — 2. Input Validation and Data Sanitization: No new user input surface is added. Existing sandbox-name variables remain quoted in shell command invocations, limiting shell word-splitting/globbing risk. No eval, unsafe deserialization, path traversal, SSRF, or command-string construction is introduced by this patch.
  • pass — 3. Authentication and Authorization: No authentication or authorization logic is modified. The change exercises existing nemoclaw sandbox status/logs CLI paths in E2E helper scripts.
  • pass — 4. Dependencies and Third-Party Libraries: No dependencies, package manifests, lockfiles, registries, or third-party libraries are added or changed.
  • pass — 5. Error Handling and Logging: Failure output behavior is unchanged except for the command being invoked. No new logging of secrets or sensitive data is introduced.
  • pass — 6. Cryptography and Data Protection: Not applicable — no cryptographic operations or data-protection mechanisms are changed.
  • pass — 7. Configuration and Security Headers: No HTTP endpoint configuration, CORS/CSP, Dockerfile, container permission, or security-header settings are changed.
  • pass — 8. Security Testing: The touched files are E2E validation helpers and their unit-style scenario-framework tests. The mock test was updated to verify the new sandbox subcommand shape for status/logs. No existing security test coverage appears degraded by this narrow diff.
  • pass — 9. Holistic Security Posture: The change aligns E2E assertions with the current sandbox-specific CLI boundary and does not modify sandbox runtime, policy, network, credential, installer, workflow, or blueprint logic. No sandbox escape, SSRF bypass, policy bypass, credential leakage, or workflow trusted-code-boundary issue is apparent in the diff.

Test / E2E status

  • Test depth: unit_sufficient — The diff is limited to E2E validation helper command names and a mocked helper test. The provided testDepth context says these changes cannot affect runtime behavior directly; however, because the PR is intended to fix an E2E failure, pending platform E2E jobs remain important gate evidence.
  • E2E Advisor: missing (not found)
  • Required E2E jobs: E2E recommendation, wsl-e2e, macos-e2e
  • Missing for analyzed SHA: E2E recommendation, wsl-e2e, macos-e2e

✅ What looks good

  • The patch is small and focused: 5 insertions and 5 deletions across three E2E-related files.
  • The legacy nemoclaw status/logs <sandbox> calls were consistently updated to nemoclaw sandbox status/logs <sandbox> in both affected validation helpers.
  • Sandbox names remain shell-quoted in the changed command invocations.
  • The scenario-framework mock was updated alongside the helper changes, reducing unit-test drift from the expected CLI shape.

Review completeness

  • Review used trusted deterministic PR metadata and read-only file inspection only; no commands, tests, package-manager operations, or PR scripts were executed.
  • No linked issues or issue comments were available, so acceptance mapping is based on PR-body clauses and diff evidence rather than linked-issue acceptance criteria.
  • CI and E2E checks were still pending in the provided metadata, so final pass/fail signal for the latest head SHA was unavailable.
  • Overlapping PRs were identified from trusted metadata but their full diffs were not inspected in this review.
  • Human maintainer review required: yes

@github-actions
Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: ubuntu-repo-cloud-openclaw, sandbox-operations-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. No merge-blocking E2E is required because the PR only changes E2E test/helper code and nearby helper unit tests, not installer, onboarding, sandbox runtime, credentials, security policy, inference routing, deployment, or user-facing assistant runtime code. Optional live E2E can validate that the changed assertions still match the current CLI contract.

Optional E2E

  • ubuntu-repo-cloud-openclaw (medium-high: live install/onboard with NVIDIA API and Docker sandbox; up to 90 minute workflow timeout): Optional confidence check for the exact changed validation helpers: run the Ubuntu repo scenario with suite_filter baseline-onboarding,sandbox-operations,sandbox-lifecycle to verify the updated nemoclaw sandbox status/logs assertions against a live sandbox.
  • sandbox-operations-e2e (high: live sandbox operations suite with extended timeout): Broader optional regression coverage for live sandbox list/status/log/exec behavior adjacent to the helper changes, though this legacy E2E script does not directly exercise the modified scenario validation helper library.

New E2E recommendations

  • None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants