Skip to content

bug: provider cleanup can fail after sandbox deletion conflict #1517

@TaylorMutch

Description

@TaylorMutch

Agent Diagnostic

Skills loaded: watch-github-actions, create-github-issue.

Investigation method: Watched PR #1516 checks after applying the test:e2e label, inspected the failed Python E2E job through the GitHub Actions logs, and compared the failure with the Docker routing change in the PR.

Findings:

Description

Actual behavior: Python E2E can fail during cleanup when deleting a sandbox races with another sandbox resource update. The sandbox deletion RPC returns ABORTED, and the provider cleanup then fails with FAILED_PRECONDITION because the provider is still attached to the sandbox.

Expected behavior: Cleanup paths should be resilient to transient sandbox resource-version conflicts. A delete request should retry the phase transition or otherwise leave the sandbox/provider relationship in a cleanup-safe state so provider deletion does not fail after a transient concurrent modification.

Reproduction Steps

  1. Run the Python E2E suite with Docker-backed sandboxes on Linux ARM64, as in the e2e / E2E (python) GitHub Actions job.
  2. Observe e2e/python/test_sandbox_providers.py::test_nvidia_provider_injects_nvidia_api_key_env_var.
  3. The test can fail during cleanup after sandbox creation and execution have succeeded.

Environment

  • OS: Linux ARM64 GitHub Actions runner (linux-arm64-cpu8)
  • Docker API: 1.54
  • OpenShell commit: c0a9306b1884d4584dc36b9f0b9cc85942da3dad
  • Workflow: Branch E2E Checks
  • Job: e2e / E2E (python)

Logs

FAILED e2e/python/test_sandbox_providers.py::test_nvidia_provider_injects_nvidia_api_key_env_var

grpc._channel._InactiveRpcError:
  status = StatusCode.ABORTED
  details = "set sandbox phase to Deleting failed due to concurrent modification (current resource_version: 3)"

During handling of the above exception, another exception occurred:

grpc._channel._InactiveRpcError:
  status = StatusCode.FAILED_PRECONDITION
  details = "provider 'e2e-test-nvidia-provider-env' is attached to sandbox(es): ecstatic-bug"

1 failed, 75 passed in 54.98s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions