Skip to content

[codex] add live provider E2E workflow#175

Open
zozo123 wants to merge 1 commit into
openclaw:mainfrom
zozo123:codex/live-provider-e2e-gh-actions
Open

[codex] add live provider E2E workflow#175
zozo123 wants to merge 1 commit into
openclaw:mainfrom
zozo123:codex/live-provider-e2e-gh-actions

Conversation

@zozo123
Copy link
Copy Markdown
Contributor

@zozo123 zozo123 commented May 27, 2026

Summary

  • add a manual Live Provider E2E GitHub Actions workflow that expands a selected provider list into one smoke job per built-in provider
  • add scripts/live-provider-e2e.sh to run fast provider-specific smoke checks, skip missing secret/tool setup when requested, and clean up leases where supported
  • document the maintainer secret checklist and workflow controls in docs/features/live-provider-e2e.md

Notes

The workflow is workflow_dispatch only, defaults to allow_missing=true, and does not expose provider secrets to pull request jobs. Maintainers can switch allow_missing=false once the selected provider secrets are expected to be present.

Some providers still require a runner-side CLI or provider-side resource that cannot be represented as a GitHub secret, such as devbox, sprite, an existing Railway service, or a self-hosted Parallels host. Those requirements are called out in the doc.

Verification

  • bash -n scripts/live-provider-e2e.sh
  • ruby -e 'require "yaml"; YAML.load_file(".github/workflows/live-provider-e2e.yml"); puts "workflow yaml ok"'
  • go build -trimpath -o /tmp/crabbox-provider-e2e ./cmd/crabbox
  • node scripts/check-docs-links.mjs
  • CRABBOX_LIVE=1 CRABBOX_LIVE_SKIP_MISSING=1 CRABBOX_BIN=/tmp/crabbox-provider-e2e scripts/live-provider-e2e.sh aws
  • CRABBOX_LIVE=1 CRABBOX_LIVE_SKIP_MISSING=1 CRABBOX_BIN=/tmp/crabbox-provider-e2e scripts/live-provider-e2e.sh e2b

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 27, 2026

Codex review: needs real behavior proof before merge. Reviewed May 31, 2026, 3:38 AM ET / 07:38 UTC.

Summary
The PR adds a manual GitHub Actions live-provider E2E matrix, a provider smoke shell script, and docs for maintainer setup and workflow controls.

Reproducibility: not applicable. this is a new maintainer-run workflow feature, not a bug report. The PR defects are source-reproducible from the workflow env block and the mismatch with current provider docs/registry.

Review metrics: 3 noteworthy metrics.

  • Diff surface: 4 files changed, 735 additions. The PR adds a privileged workflow plus a new live smoke script, so secret handling and coverage need review before merge.
  • Secret bindings: 89 job-level secret references. Every provider job inherits the full credential set unless the workflow scopes secrets more tightly.
  • Provider coverage: 23 hardcoded, 2 documented built-ins missing. The new all matrix is already behind current provider docs and registry imports.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🧂 unranked krab
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted terminal or workflow-log proof showing the new script/workflow path after the fix; updating the PR body should trigger a fresh ClawSweeper review, or a maintainer can comment @clawsweeper re-review if it does not.
  • [P1] Scope provider secrets to the selected provider after matrix selection and before only the steps that need them.
  • Derive or test the provider matrix against the registered/documented provider list.

Proof guidance:

  • [P1] Needs real behavior proof before merge: Missing: the PR body lists verification commands but no redacted terminal output, workflow log, or artifact proving the new workflow/script behavior; add proof with private details redacted and update the PR body for re-review.

Risk before merge

  • [P1] Merging as-is would fan out 89 repository secrets into every provider job, including setup and third-party package installation steps that do not need most of those credentials.
  • [P1] The providers=all path would silently miss documented built-ins azure-dynamic-sessions and ascii-box, giving maintainers false provider coverage.
  • [P1] The PR body has command names but no redacted terminal output, workflow log, or artifact proving the new workflow/script behavior.
  • [P1] GitHub context reports the branch as dirty, so maintainers need a refreshed merge result before any final landing decision.

Maintainer options:

  1. Repair workflow before merge (recommended)
    Scope secrets per provider, derive or test the provider matrix against registered providers, and require redacted live/script proof before landing.
  2. Keep the proposal as the decision point
    If maintainers are not ready to own the provider-secret and runner setup policy, keep Proposal: add live provider E2E GitHub Actions matrix #176 as the product discussion and pause or close this branch.
  3. Explicitly accept the secret fanout
    Maintainers could intentionally accept all-secret fanout only after approving that security boundary and documenting the operator risk.

Next step before merge

  • [P1] Contributor proof and maintainer-visible secret-scoping decisions are required before merge; automation cannot provide proof for the contributor's live provider setup.

Security
Needs attention: The workflow currently exposes all provider credentials to every provider job and third-party setup/install steps.

Review findings

  • [P1] Scope provider secrets to selected jobs — .github/workflows/live-provider-e2e.yml:121-209
  • [P1] Build the matrix from registered providers — .github/workflows/live-provider-e2e.yml:52-76
Review details

Best possible solution:

Land a narrowed manual workflow that derives or tests provider coverage against Crabbox's registered providers, scopes only selected-provider secrets into the steps that need them, and includes redacted live workflow/script proof.

Do we have a high-confidence way to reproduce the issue?

Not applicable; this is a new maintainer-run workflow feature, not a bug report. The PR defects are source-reproducible from the workflow env block and the mismatch with current provider docs/registry.

Is this the best way to solve the issue?

No; the proposed direction is useful, but the implementation should scope secrets per selected provider and derive or test the matrix from the registered provider source before it is the maintainable solution.

Full review comments:

  • [P1] Scope provider secrets to selected jobs — .github/workflows/live-provider-e2e.yml:121-209
    The provider job places every provider credential in job-level env, so checkout/setup actions, the common install step, and the modal/tensorlake pip install run with unrelated AWS/GCP/SSH/etc. secrets. Please inject only the selected provider's secrets into the smoke step or a provider-specific setup step.
    Confidence: 0.94
  • [P1] Build the matrix from registered providers — .github/workflows/live-provider-e2e.yml:52-76
    The hardcoded all list is already stale: current main documents/registers azure-dynamic-sessions and ascii-box, but providers=all would never schedule them. A workflow advertised as every built-in provider needs to derive from the registered providers or share a checked list with tests.
    Confidence: 0.92

Overall correctness: patch is incorrect
Overall confidence: 0.91

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 9a42efcdaefc.

Label changes

Label justifications:

  • P2: This is a useful but non-emergency maintainer workflow improvement with limited blast radius and clear merge blockers.
  • merge-risk: 🚨 security-boundary: The workflow currently exposes all provider credentials to every provider job and setup/install step.
  • merge-risk: 🚨 automation: The new provider automation hardcodes a stale matrix, so green checks would not prove every built-in provider path.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🧂 unranked krab.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: Missing: the PR body lists verification commands but no redacted terminal output, workflow log, or artifact proving the new workflow/script behavior; add proof with private details redacted and update the PR body for re-review.
Evidence reviewed

Security concerns:

  • [high] All provider secrets are job-scoped — .github/workflows/live-provider-e2e.yml:121
    Lines 121-209 bind the full provider credential set at job scope, so unrelated provider jobs and package installation steps inherit secrets they do not need. That broadens the blast radius of a manual live-provider workflow that handles real cloud credentials.
    Confidence: 0.94

What I checked:

  • PR diff surface: Commit 4132f3a adds one workflow, one script, one docs page, and one docs index entry for 735 insertions. (4132f3aeabf3)
  • Provider secrets are job-scoped: The provider-e2e job binds 89 secrets at job env scope before checkout/setup/install/smoke steps, including third-party Python package installs for modal and tensorlake. (.github/workflows/live-provider-e2e.yml:121, 4132f3aeabf3)
  • Provider matrix is hardcoded: The workflow's all provider list hardcodes 23 identifiers instead of deriving from Crabbox's registered providers or a tested source of truth. (.github/workflows/live-provider-e2e.yml:52, 4132f3aeabf3)
  • Current provider docs/registry include omitted providers: Current main documents ascii-box and azure-dynamic-sessions, and the provider registry imports both providers/asciibox and providers/azuredynamicsessions; the PR matrix does not schedule either. (docs/providers/README.md:66, 9a42efcdaefc)
  • No existing live-provider workflow on main: Current main has only ci.yml, hydrate.yml, pages.yml, and release.yml under .github/workflows, so the central requested workflow is not already implemented. (.github/workflows, 9a42efcdaefc)
  • Proof remains absent: The PR body lists verification commands and the latest contributor comment asks maintainers to review, but the provided body/comments include no redacted terminal output, workflow log, or artifact showing after-fix behavior.

Likely related people:

  • Peter Steinberger: Current blame for the provider registry/test, provider docs, and existing live-smoke script points to release commit e35b7a7, making this the clearest routing signal for provider workflow review. (role: recent area contributor; confidence: medium; commits: e35b7a78d5bb; files: internal/providers/all/all.go, internal/providers/all/all_test.go, docs/features/providers.md)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. labels May 27, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 27, 2026

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@zozo123
Copy link
Copy Markdown
Contributor Author

zozo123 commented May 30, 2026

Would be nice to have ci/cd to smoke test the providers or local sandboxes across machines and archs. Changing as ready for review so maintainers think about it and consider it and help to develop and make it happen on the ci/cd.

@zozo123 zozo123 marked this pull request as ready for review May 31, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. P2 Normal priority bug or improvement with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant