Skip to content

SubprocessCLITransport.close() hangs indefinitely if subprocess shutdown handler blocks #728

@xu-jenny

Description

@xu-jenny

Bug

SubprocessCLITransport.close() in the Python claude-agent-sdk sends SIGTERM to the Claude Code subprocess, then calls await self._process.wait() with no timeout. If the Claude Code Node.js process catches SIGTERM and runs a shutdown handler that gets stuck (e.g., waiting for an MCP server or subagent to exit), process.wait() blocks forever.

Impact

In our use case (running ~200 parallel Claude Code instances via the SDK), 30% of jobs hung indefinitely — Claude code job finished successfully but the VM stayed hanging because the SDK's transport.close() never returned, causing the entire worker process to hit VM infra timeout.

Suggested fix

Wrap this in close()

  async def close(self) -> None:
      if self._process and self._process.returncode is None:
          self._process.terminate()
          try:
              async with anyio.fail_after(shutdown_timeout):  # e.g. 30s
                  await self._process.wait()
          except TimeoutError:
              self._process.kill()
              await self._process.wait()
      self._process = None

Ideally shutdown_timeout would be configurable via ClaudeAgentOptions (we used 30s successfully).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions