Bug
SubprocessCLITransport.close() in the Python claude-agent-sdk sends SIGTERM to the Claude Code subprocess, then calls await self._process.wait() with no timeout. If the Claude Code Node.js process catches SIGTERM and runs a shutdown handler that gets stuck (e.g., waiting for an MCP server or subagent to exit), process.wait() blocks forever.
Impact
In our use case (running ~200 parallel Claude Code instances via the SDK), 30% of jobs hung indefinitely — Claude code job finished successfully but the VM stayed hanging because the SDK's transport.close() never returned, causing the entire worker process to hit VM infra timeout.
Suggested fix
Wrap this in close()
async def close(self) -> None:
if self._process and self._process.returncode is None:
self._process.terminate()
try:
async with anyio.fail_after(shutdown_timeout): # e.g. 30s
await self._process.wait()
except TimeoutError:
self._process.kill()
await self._process.wait()
self._process = None
Ideally shutdown_timeout would be configurable via ClaudeAgentOptions (we used 30s successfully).
Bug
SubprocessCLITransport.close() in the Python claude-agent-sdk sends SIGTERM to the Claude Code subprocess, then calls await self._process.wait() with no timeout. If the Claude Code Node.js process catches SIGTERM and runs a shutdown handler that gets stuck (e.g., waiting for an MCP server or subagent to exit), process.wait() blocks forever.
Impact
In our use case (running ~200 parallel Claude Code instances via the SDK), 30% of jobs hung indefinitely — Claude code job finished successfully but the VM stayed hanging because the SDK's transport.close() never returned, causing the entire worker process to hit VM infra timeout.
Suggested fix
Wrap this in
close()Ideally shutdown_timeout would be configurable via ClaudeAgentOptions (we used 30s successfully).