Skip to content

Plugin v1.0.4 codex-companion review --background hangs 2-30min into review (CLI 0.125, Windows) #277

@nixichbendeyil

Description

@nixichbendeyil

Summary

Plugin v1.0.4 with Codex CLI 0.125 hangs deterministically during review --background after 2-30 minutes of activity. The codex.exe app-server subprocess goes idle (CPU delta ~0s over 90s polling, 19 threads constant), the logfile mtime freezes, and no error output is produced. The companion process can be killed but the state-tracker still reports phase: "running". We've confirmed via 5 bisection datapoints across CLI 0.121-0.125 and plugin v1.0.3+v1.0.4 that this isn't a single-version regression. We did find a workaround: a fresh Claude-Code chat process plus a hygiene routine between reviews enabled 9 sequential reviews in one session without a hang, suggesting chat-session-state contamination is a contributing factor.

Environment

  • OS: Windows 11 Pro 10.0.26100
  • Node: v24.14.0
  • npm: 11.9.0
  • Codex CLI: 0.125.0 (also reproduced on 0.121-0.124)
  • Plugin: v1.0.4 (cache openai-codex/codex/1.0.4)
  • Marketplace HEAD: 807e03a
  • ~/.codex/config.toml:
model = "gpt-5.5"
model_reasoning_effort = "medium"

[windows]
sandbox = "elevated"

[projects.'C:\path\to\repo']
trust_level = "trusted"

Reproduce Recipe (minimal)

git init demo && cd demo && echo "x" > a.txt && git add a.txt && git commit -m "x"
echo "y" >> a.txt && git commit -am "y"
echo "z" >> a.txt && git commit -am "z"
codex review --background --base HEAD~3 --scope branch
# Hang after 2-16 PowerShell-side commands; codex.exe subprocess goes idle.

Expected

Review runs to completion (consistent with the behavior we observe immediately after a fresh Claude-Code chat process restart).

Actual

codex-companion.mjs exits 0, but the codex.exe app-server subprocess survives orphaned and idle. State file shows phase: "running" with no log updates. CPU delta on the orphan subprocess: ~0.015s over 90s. Logfile mtime frozen for 2+ minutes at the time of polling.

Bisection Findings

# CLI Plugin Setup Result
1 0.124 v1.0.3 Worktree Mid-review hang after Get-ChildItem -Recurse
2 0.124 v1.0.3 Worktree Hang after 50s, 21min silence
3 0.124 v1.0.3 Worktree Hang after 30s, 10min silence
4 0.125 v1.0.4 Main repo + hygiene Hang after 1m / 16 PowerShell cmds
5 0.125 v1.0.4 Main repo + clean room Hang after 2 cmds (git diff + git diff --stat)

We also tested 0.122 and 0.123 — those failed earlier in setup with a spawnSync pipe-deadlock on app-server --help (>4KB stdout), so they never reached the review phase. CLI 0.121 with v1.0.4 returned a server-side 400 newer version required (clean termination, no hang).

Workaround Found

In a follow-up session (datapoints 6 + 7 in our bisection notes), we restarted the entire Claude-Code chat process (not just /clear — a brand-new process) and ran the same review against the same diff. It completed cleanly: companion exit 0, Reviewer finished, Turn completed, full output produced. We then ran 9 sequential reviews in that one session using only this hygiene routine between reviews:

Get-CimInstance Win32_Process | Where-Object { $_.Name -match '^codex' } | Stop-Process -Force
Remove-Item ~/.claude/plugins/data/codex-openai-codex/state/<repohash>/jobs/* -Force

All 9 reviews exited 0 and produced output (no hangs, no orphan codex.exe). Suggests chat-session-state contamination is a contributing factor: stale file handles, leftover process-tree state, residual stdio pipes accumulate across multiple Codex invocations within one Claude-Code process.

Hypothesis

The hang is 2-factor:

  1. CLI 0.125 git-pattern decline-handling: when sandbox elevated rejects a git --git-dir=... recovery attempt, CLI 0.125 enters a state from which it doesn't escape (CLI 0.121 handled the same decline cleanly but hung one step later on git -c safe.directory=...).
  2. Chat-session-state pipe/handle contamination accumulated from prior tool invocations in the same Claude-Code process.

A fresh Claude-Code process eliminates (2) and the workaround works around (1) entirely. Neither factor alone is sufficient — datapoint 5 (clean room, not fresh chat) hung; datapoint 6 (clean room, fresh chat) didn't.

Plugin-Code Bugs (independent, mentioned for completeness)

  • lib/process.mjs:5spawnSync without timeout. If a child pipe-deadlocks (>4KB stdout, e.g. CLI 0.122/0.123 --help) the companion hangs forever. A timeout: 5000 would surface the bug as an error.
  • lib/process.mjs:67taskkill /PID is path-mangled by MSYS under Git Bash: /PID becomes C:/Program Files/Git/PID. Either invoke via cmd.exe /c or use a PowerShell fallback.
  • Companion crash leaves an orphan codex.exe plus stale state-tracker entries — there's no cleanup path.

Ask

Either:

  1. Fix factor (1) at the CLI level (handle git --git-dir=... decline more gracefully on Windows with sandbox = "elevated"), or
  2. Document the workaround (fresh Claude-Code process per session-start, plus the hygiene routine between reviews) in the plugin README.

The 2-factor nature makes diagnosis hard for users — we spent ~2.5h bisecting before finding the fresh-chat workaround. Either fix would close the issue for us.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions