Agent Diagnostic
Skills loaded: watch-github-actions, create-github-issue.
Investigation method: Watched PR #1516 checks after applying the test:e2e label, inspected the failed Python E2E job through the GitHub Actions logs, and compared the failure with the Docker routing change in the PR.
Findings:
Description
Actual behavior: Python E2E can fail during cleanup when deleting a sandbox races with another sandbox resource update. The sandbox deletion RPC returns ABORTED, and the provider cleanup then fails with FAILED_PRECONDITION because the provider is still attached to the sandbox.
Expected behavior: Cleanup paths should be resilient to transient sandbox resource-version conflicts. A delete request should retry the phase transition or otherwise leave the sandbox/provider relationship in a cleanup-safe state so provider deletion does not fail after a transient concurrent modification.
Reproduction Steps
- Run the Python E2E suite with Docker-backed sandboxes on Linux ARM64, as in the
e2e / E2E (python) GitHub Actions job.
- Observe
e2e/python/test_sandbox_providers.py::test_nvidia_provider_injects_nvidia_api_key_env_var.
- The test can fail during cleanup after sandbox creation and execution have succeeded.
Environment
- OS: Linux ARM64 GitHub Actions runner (
linux-arm64-cpu8)
- Docker API: 1.54
- OpenShell commit:
c0a9306b1884d4584dc36b9f0b9cc85942da3dad
- Workflow:
Branch E2E Checks
- Job:
e2e / E2E (python)
Logs
FAILED e2e/python/test_sandbox_providers.py::test_nvidia_provider_injects_nvidia_api_key_env_var
grpc._channel._InactiveRpcError:
status = StatusCode.ABORTED
details = "set sandbox phase to Deleting failed due to concurrent modification (current resource_version: 3)"
During handling of the above exception, another exception occurred:
grpc._channel._InactiveRpcError:
status = StatusCode.FAILED_PRECONDITION
details = "provider 'e2e-test-nvidia-provider-env' is attached to sandbox(es): ecstatic-bug"
1 failed, 75 passed in 54.98s
Agent Diagnostic
Skills loaded:
watch-github-actions,create-github-issue.Investigation method: Watched PR #1516 checks after applying the
test:e2elabel, inspected the failed Python E2E job through the GitHub Actions logs, and compared the failure with the Docker routing change in the PR.Findings:
e2e / E2E (rust-docker)passed.e2e / E2E (rust-podman)passed.e2e / E2E (python)failed in provider cleanup, not during gateway startup or Docker callback routing.Deleting. Provider cleanup then failed because the sandbox was still attached.Description
Actual behavior: Python E2E can fail during cleanup when deleting a sandbox races with another sandbox resource update. The sandbox deletion RPC returns
ABORTED, and the provider cleanup then fails withFAILED_PRECONDITIONbecause the provider is still attached to the sandbox.Expected behavior: Cleanup paths should be resilient to transient sandbox resource-version conflicts. A delete request should retry the phase transition or otherwise leave the sandbox/provider relationship in a cleanup-safe state so provider deletion does not fail after a transient concurrent modification.
Reproduction Steps
e2e / E2E (python)GitHub Actions job.e2e/python/test_sandbox_providers.py::test_nvidia_provider_injects_nvidia_api_key_env_var.Environment
linux-arm64-cpu8)c0a9306b1884d4584dc36b9f0b9cc85942da3dadBranch E2E Checkse2e / E2E (python)Logs