diff --git a/plugins/codex/agents/codex-rescue.md b/plugins/codex/agents/codex-rescue.md index 7009ec86..e42c8d39 100644 --- a/plugins/codex/agents/codex-rescue.md +++ b/plugins/codex/agents/codex-rescue.md @@ -20,8 +20,10 @@ Selection guidance: Forwarding rules: - Use exactly one `Bash` call to invoke `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" task ...`. -- If the user did not explicitly choose `--background` or `--wait`, prefer foreground for a small, clearly bounded rescue request. -- If the user did not explicitly choose `--background` or `--wait` and the task looks complicated, open-ended, multi-step, or likely to keep Codex running for a long time, prefer background execution. +- Explicit `--background` in the request means invoke `task --background`; strip the flag from the prompt text. +- Default to `task --background` whenever the user did NOT explicitly pass `--wait`. Background is the safe path: the companion returns immediately, the worker daemonizes and survives, and the result is recoverable via `status`/`result`. +- Explicit `--wait` means invoke foreground `task` — but ONLY for a short, clearly bounded request (quick fix, single-file edit, focused diagnosis that finishes well under ~140s); strip the flag from the prompt text. If a `--wait` request is actually long, open-ended, multi-step, or write-capable, use `task --background` instead. When in doubt, `--background`. +- Why background is the default: a foreground (blocking) `task` call running inside THIS subagent is auto-backgrounded by the harness and then reaped when the subagent ends (observed at ~143s), which kills the Codex worker mid-run and silently loses the work. `task --background` avoids this entirely, so never run substantial work foreground from here. - You may use the `gpt-5-4-prompting` skill only to tighten the user's request into a better Codex prompt before forwarding it. - Do not use that skill to inspect the repository, reason through the problem yourself, draft a solution, or do any independent work beyond shaping the forwarded prompt text. - Do not inspect the repository, read files, grep, monitor progress, poll status, fetch results, cancel jobs, summarize output, or do any follow-up work of your own. diff --git a/plugins/codex/commands/rescue.md b/plugins/codex/commands/rescue.md index 56de9555..1eed1f81 100644 --- a/plugins/codex/commands/rescue.md +++ b/plugins/codex/commands/rescue.md @@ -13,10 +13,12 @@ $ARGUMENTS Execution mode: -- If the request includes `--background`, run the `codex:codex-rescue` subagent in the background. -- If the request includes `--wait`, run the `codex:codex-rescue` subagent in the foreground. -- If neither flag is present, default to foreground. -- `--background` and `--wait` are execution flags for Claude Code. Do not forward them to `task`, and do not treat them as part of the natural-language task text. +- If the request includes `--background`, pass `--background` to the `codex:codex-rescue` subagent so it invokes companion `task --background`. +- If the request includes `--wait`, pass `--wait` to the `codex:codex-rescue` subagent so it invokes foreground `task`. +- If neither flag is present, default to `--background` (the safe path: the worker daemonizes and survives; collect the result via `/codex:status` + `/codex:result`). Only stay foreground when `--wait` is explicit AND the rescue is short and clearly bounded. +- For long, substantial, or open-ended rescues, pass `--background`; the companion returns `[[codex-task status=dispatched id=]]`. A dispatched background job has no automatic push notification, so arm a watcher by running `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" status --wait --timeout-ms 1800000` via the Bash tool with `run_in_background=true`; it exits and re-invokes Claude when the job reaches a terminal status. If it returns while the job is still running, re-arm the same command. Fetch `/codex:result ` only after the watcher reports a terminal status. +- A foreground rescue running inside the subagent is auto-backgrounded by the harness and then reaped when the subagent ends (observed at ~143s — much sooner than the ~600s direct-Bash cap), which kills the Codex worker mid-run and loses the result. Route anything substantial to `--background`. +- `--background` and `--wait` are execution flags for Claude Code and the rescue subagent. Preserve the execution choice, and strip them only from the natural-language task text before Codex sees the prompt. - `--model` and `--effort` are runtime-selection flags. Preserve them for the forwarded `task` call, but do not treat them as part of the natural-language task text. - If the request includes `--resume`, do not ask whether to continue. The user already chose. - If the request includes `--fresh`, do not ask whether to continue. The user already chose. @@ -41,9 +43,17 @@ Operating rules: - The subagent is a thin forwarder only. It should use one `Bash` call to invoke `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" task ...` and return that command's stdout as-is. - Return the Codex companion stdout verbatim to the user. - Do not paraphrase, summarize, rewrite, or add commentary before or after it. +- The companion stdout carries a deterministic sentinel on a standalone line: `[[codex-task status=complete]]` means a foreground task has completed and exited; `[[codex-task status=dispatched id=]]` means a companion background job was queued. Trust this sentinel, companion state, and the PostToolUse hook over any prose. A dispatched sentinel is not an automatic-notification promise; never treat a dispatched/background job as done until `codex-companion.mjs status --wait` returns a terminal status from a Bash watcher launched with `run_in_background=true`. +- The Claude Code harness appends an `agentId: (use SendMessage with to: '' to continue this agent)` line to every `Agent(codex:codex-rescue)` return. This is a resume token for continuing the same agent thread, NOT a "still running" or "in background" status signal. By the time you see it, the subagent has finished. Do not paraphrase it as "Codex is running in background" or "task dispatched async" — the stdout above the suffix is the actual completed output, which you return verbatim per the rule above. - Do not ask the subagent to inspect files, monitor progress, poll `/codex:status`, fetch `/codex:result`, call `/codex:cancel`, summarize output, or do follow-up work of its own. - Leave `--effort` unset unless the user explicitly asks for a specific reasoning effort. - Leave the model unset unless the user explicitly asks for one. If they ask for `spark`, map it to `gpt-5.3-codex-spark`. - Leave `--resume` and `--fresh` in the forwarded request. The subagent handles that routing when it builds the `task` command. - If the helper reports that Codex is missing or unauthenticated, stop and tell the user to run `/codex:setup`. - If the user did not supply a request, ask what Codex should investigate or fix. + +Workspace boundary: + +- A Codex `task` run can only create or modify files **within its working directory** (the Codex `workspace-write` sandbox is rooted at the run's `--cwd`). Files outside that directory are read-only. +- The `Agent`-tool `codex:codex-rescue` dispatch inherits **this Claude session's** cwd, and the `Agent` tool cannot change it. So a rescue request that needs to write to a *different* repository will silently fail to write (Codex can only probe, not edit) — do not route cross-repo write work through the subagent. +- To target a different repository, either run from a Claude session inside that repo, or invoke the companion directly with `--cwd `: `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" task --cwd --write ...`. diff --git a/plugins/codex/hooks/hooks.json b/plugins/codex/hooks/hooks.json index 19e33b81..46c9412d 100644 --- a/plugins/codex/hooks/hooks.json +++ b/plugins/codex/hooks/hooks.json @@ -1,5 +1,5 @@ { - "description": "Optional stop-time review gate for Codex Companion.", + "description": "Codex Companion lifecycle hooks, stop-time review gate, and codex-rescue completion context.", "hooks": { "SessionStart": [ { @@ -33,6 +33,18 @@ } ] } + ], + "PostToolUse": [ + { + "matcher": "Agent", + "hooks": [ + { + "type": "command", + "command": "node \"${CLAUDE_PLUGIN_ROOT}/scripts/codex-rescue-completion-hook.mjs\"", + "timeout": 5 + } + ] + } ] } } diff --git a/plugins/codex/scripts/codex-companion.mjs b/plugins/codex/scripts/codex-companion.mjs index 35222fd5..adf0c65e 100644 --- a/plugins/codex/scripts/codex-companion.mjs +++ b/plugins/codex/scripts/codex-companion.mjs @@ -29,6 +29,7 @@ import { getConfig, listJobs, setConfig, + updateState, upsertJob, writeJobFile } from "./lib/state.mjs"; @@ -61,11 +62,14 @@ import { renderStatusReport, renderTaskResult } from "./lib/render.mjs"; +import { buildTaskDispatchedStatusToken } from "./lib/task-status-token.mjs"; const ROOT_DIR = path.resolve(fileURLToPath(new URL("..", import.meta.url))); const REVIEW_SCHEMA = path.join(ROOT_DIR, "schemas", "review-output.schema.json"); const DEFAULT_STATUS_WAIT_TIMEOUT_MS = 240000; const DEFAULT_STATUS_POLL_INTERVAL_MS = 2000; +const FOREGROUND_TASK_POLL_INTERVAL_MS = 100; +const FOREGROUND_TASK_MISSING_JOB_RETRY_MS = 5000; const VALID_REASONING_EFFORTS = new Set(["none", "minimal", "low", "medium", "high", "xhigh"]); const MODEL_ALIASES = new Map([["spark", "gpt-5.3-codex-spark"]]); const STOP_REVIEW_TASK_MARKER = "Run a stop-gate review of the previous Claude turn."; @@ -157,6 +161,38 @@ function sleep(ms) { return new Promise((resolve) => setTimeout(resolve, ms)); } +function normalizeWorkerPid(pid) { + const normalized = Number(pid); + return Number.isInteger(normalized) && normalized > 0 ? normalized : null; +} + +function isProcessAlive(pid) { + // A pid that answers signal 0 still has a live process table entry. EPERM also + // means the process exists but this user cannot signal it. + const normalizedPid = normalizeWorkerPid(pid); + if (!normalizedPid) { + return false; + } + + try { + process.kill(normalizedPid, 0); + return true; + } catch (error) { + return error?.code === "EPERM"; + } +} + +function hasExitedActiveWorker(job) { + // Active jobs with a recorded worker pid should keep that worker alive until + // the job reaches a terminal state; a missing process means launch failed. + const workerPid = normalizeWorkerPid(job?.pid); + return Boolean(workerPid && isActiveJobStatus(job?.status) && !isProcessAlive(workerPid)); +} + +function isMissingJobError(error) { + return String(error?.message ?? "").startsWith("No job found for "); +} + function shorten(text, limit = 96) { const normalized = String(text ?? "").trim().replace(/\s+/g, " "); if (!normalized) { @@ -288,6 +324,22 @@ function isActiveJobStatus(status) { return status === "queued" || status === "running"; } +function isTerminalJobRecord(stateJob, storedJob = null) { + // PR #346 review: launch/cancel races can expose either the summary state row + // or the per-job file first, so both records must agree the job is active + // before a launcher writes worker-owned fields such as pid. + return !isActiveJobStatus(stateJob?.status) || Boolean(storedJob && !isActiveJobStatus(storedJob.status)); +} + +function resolveLaunchStatus(stateJob, storedJob = null, fallback = "queued") { + for (const status of [stateJob?.status, storedJob?.status]) { + if (status && !isActiveJobStatus(status)) { + return status; + } + } + return stateJob?.status ?? storedJob?.status ?? fallback; +} + function getCurrentClaudeSessionId() { return process.env[SESSION_ID_ENV] ?? null; } @@ -315,17 +367,49 @@ function findLatestResumableTaskJob(jobs) { async function waitForSingleJobSnapshot(cwd, reference, options = {}) { const timeoutMs = Math.max(0, Number(options.timeoutMs) || DEFAULT_STATUS_WAIT_TIMEOUT_MS); const pollIntervalMs = Math.max(100, Number(options.pollIntervalMs) || DEFAULT_STATUS_POLL_INTERVAL_MS); + const retryMissingJobMs = Math.max(0, Number(options.retryMissingJobMs) || FOREGROUND_TASK_MISSING_JOB_RETRY_MS); const deadline = Date.now() + timeoutMs; - let snapshot = buildSingleJobSnapshot(cwd, reference); + const missingJobDeadline = Date.now() + retryMissingJobMs; + + const readSnapshot = () => { + try { + return buildSingleJobSnapshot(cwd, reference); + } catch (error) { + if (options.retryMissingJob && isMissingJobError(error) && Date.now() < missingJobDeadline) { + return null; + } + throw error; + } + }; + + let snapshot = readSnapshot(); + + while ((!snapshot || isActiveJobStatus(snapshot.job.status)) && Date.now() < deadline) { + if (snapshot && options.failWhenWorkerExits && hasExitedActiveWorker(snapshot.job)) { + return { + ...snapshot, + waitTimedOut: false, + workerExited: true, + timeoutMs + }; + } + + // Foreground task workers write state concurrently with the foreground + // waiter. If a transient read sees no parseable job, retry briefly instead + // of failing before the queued record becomes readable. + const activeDeadline = snapshot ? deadline : Math.min(deadline, missingJobDeadline); + await sleep(Math.min(pollIntervalMs, Math.max(0, activeDeadline - Date.now()))); + snapshot = readSnapshot(); + } - while (isActiveJobStatus(snapshot.job.status) && Date.now() < deadline) { - await sleep(Math.min(pollIntervalMs, Math.max(0, deadline - Date.now()))); + if (!snapshot) { snapshot = buildSingleJobSnapshot(cwd, reference); } return { ...snapshot, waitTimedOut: isActiveJobStatus(snapshot.job.status), + workerExited: Boolean(options.failWhenWorkerExits && hasExitedActiveWorker(snapshot.job)), timeoutMs }; } @@ -488,6 +572,8 @@ async function executeTaskRun(request) { sandbox: request.write ? "workspace-write" : "read-only", onProgress: request.onProgress, persistThread: true, + // Keep the workspace-boundary notice ephemeral by adding it only to turn input, not stored prompts or thread names. + boundaryNote: request.write ? buildWorkspaceBoundaryNotice(workspaceRoot) : null, threadName: resumeThreadId ? null : buildPersistentTaskThreadName(request.prompt || DEFAULT_CONTINUE_PROMPT) }); @@ -497,6 +583,7 @@ async function executeTaskRun(request) { { rawOutput, failureMessage, + status: result.status, reasoningSummary: result.reasoningSummary }, { @@ -551,7 +638,24 @@ function buildTaskRunMetadata({ prompt, resumeLast = false }) { } function renderQueuedTaskLaunch(payload) { - return `${payload.title} started in the background as ${payload.jobId}. Check /codex:status ${payload.jobId} for progress.\n`; + // PR #346 review: a terminal enqueue payload means launch did not dispatch a + // worker, so emitting the dispatched sentinel would make hooks and humans + // believe there is live background work to poll. + if (!isActiveJobStatus(payload.status)) { + if (payload.status === "cancelled") { + return `Codex task ${payload.jobId} was cancelled before a worker launched; no work started.\n`; + } + const detail = payload.errorMessage ? `: ${payload.errorMessage}` : "."; + return `Codex task ${payload.jobId} failed before a worker launched${detail}\n`; + } + + const statusToken = buildTaskDispatchedStatusToken(payload.jobId); + return [ + statusToken, + `${payload.title} dispatched as background job ${payload.jobId}.`, + `No automatic notification will arrive; poll /codex:status ${payload.jobId}.`, + `To be notified on completion, run with the Bash tool (run_in_background): node "\${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" status ${payload.jobId} --wait --timeout-ms 1800000 (re-arm it if it returns still-running).` + ].join("\n") + "\n"; } function getJobKindLabel(kind, jobClass) { @@ -610,6 +714,10 @@ function buildTaskRequest({ cwd, model, effort, prompt, write, resumeLast, jobId }; } +function buildWorkspaceBoundaryNotice(cwd) { + return `WORKSPACE: this run has write access ONLY within ${cwd}. Everything outside ${cwd} is read-only (Codex workspace-write sandbox). Do NOT probe, test, or investigate the sandbox boundary. If completing this task requires creating or modifying files outside ${cwd}, stop immediately and report that it must be re-dispatched with \`--cwd \` — do not attempt workarounds.`; +} + function readTaskPrompt(cwd, options, positionals) { if (options["prompt-file"]) { return fs.readFileSync(path.resolve(cwd, options["prompt-file"]), "utf8"); @@ -638,39 +746,219 @@ async function runForegroundCommand(job, runner, options = {}) { return execution; } -function spawnDetachedTaskWorker(cwd, jobId) { - const scriptPath = path.join(ROOT_DIR, "scripts", "codex-companion.mjs"); - const child = spawn(process.execPath, [scriptPath, "task-worker", "--cwd", cwd, "--job-id", jobId], { - cwd, - env: process.env, - detached: true, - stdio: "ignore", - windowsHide: true +function formatSpawnFailureMessage(error) { + const detail = error instanceof Error ? error.message : String(error ?? ""); + return detail ? `Failed to launch background worker: ${detail}` : "Failed to launch background worker."; +} + +function persistQueuedTaskRecord(workspaceRoot, queuedRecord) { + // PR #346 review: the launch record is inserted through updateState so the + // summary row and stored job file are created from the same read-modify-write + // transition before any worker is allowed to observe the request. + updateState(workspaceRoot, (state) => { + const existingIndex = state.jobs.findIndex((candidate) => candidate.id === queuedRecord.id); + const nextRecord = { + ...queuedRecord, + updatedAt: nowIso() + }; + if (existingIndex === -1) { + state.jobs.unshift(nextRecord); + } else { + state.jobs[existingIndex] = { + ...state.jobs[existingIndex], + ...nextRecord + }; + } + writeJobFile(workspaceRoot, queuedRecord.id, nextRecord); }); +} + +function markTaskLaunchFailed(workspaceRoot, jobId, errorMessage, fallbackLogFile = null) { + let failedJob = null; + let preservedJob = null; + + // PR #346 review: launch failures must become a durable failed job via one + // state read-modify-write, otherwise foreground waiters can hang on a queued + // record whose worker never existed. + updateState(workspaceRoot, (state) => { + const jobIndex = state.jobs.findIndex((candidate) => candidate.id === jobId); + if (jobIndex === -1) { + return; + } + + const stateJob = state.jobs[jobIndex]; + const storedJob = readStoredJob(workspaceRoot, jobId); + if (isTerminalJobRecord(stateJob, storedJob)) { + preservedJob = { + ...stateJob, + ...(storedJob ?? {}) + }; + return; + } + + const completedAt = nowIso(); + failedJob = { + ...(storedJob ?? {}), + ...stateJob, + status: "failed", + phase: "failed", + pid: null, + completedAt, + updatedAt: completedAt, + errorMessage, + logFile: stateJob.logFile ?? storedJob?.logFile ?? fallbackLogFile + }; + state.jobs[jobIndex] = failedJob; + writeJobFile(workspaceRoot, jobId, failedJob); + }); + + if (failedJob) { + appendLogLine(failedJob.logFile, errorMessage); + return failedJob; + } + return preservedJob; +} + +function persistSpawnedTaskWorkerPid(workspaceRoot, jobId, childPid) { + let shouldKillWorker = false; + let launchStatus = "queued"; + let updatedJob = null; + + // PR #346 review: pid persistence must re-check cancellation inside the same + // state read-modify-write that writes the pid. A cancel that already made the + // job terminal wins, and the just-spawned worker is killed after the state + // update instead of being resurrected by stale launch data. + updateState(workspaceRoot, (state) => { + const jobIndex = state.jobs.findIndex((candidate) => candidate.id === jobId); + if (jobIndex === -1) { + shouldKillWorker = true; + launchStatus = "cancelled"; + return; + } + + const stateJob = state.jobs[jobIndex]; + const storedJob = readStoredJob(workspaceRoot, jobId); + if (isTerminalJobRecord(stateJob, storedJob)) { + shouldKillWorker = true; + launchStatus = resolveLaunchStatus(stateJob, storedJob, "cancelled"); + updatedJob = { + ...stateJob, + ...(storedJob ?? {}) + }; + return; + } + + const updatedAt = nowIso(); + updatedJob = { + ...(storedJob ?? {}), + ...stateJob, + pid: childPid, + updatedAt + }; + state.jobs[jobIndex] = updatedJob; + writeJobFile(workspaceRoot, jobId, updatedJob); + launchStatus = updatedJob.status; + }); + + return { shouldKillWorker, launchStatus, job: updatedJob }; +} + +function spawnDetachedTaskWorker(cwd, jobId, options = {}) { + const scriptPath = path.join(ROOT_DIR, "scripts", "codex-companion.mjs"); + let child = null; + + try { + child = spawn(process.execPath, [scriptPath, "task-worker", "--cwd", cwd, "--job-id", jobId], { + cwd, + env: process.env, + detached: true, + stdio: "ignore", + windowsHide: true + }); + } catch (error) { + return { child: null, error }; + } + + const handleSpawnError = (error) => { + options.onError?.(error); + }; + if (typeof child.once === "function") { + child.once("error", handleSpawnError); + } else if (typeof child.on === "function") { + child.on("error", handleSpawnError); + } + child.unref(); - return child; + if (!normalizeWorkerPid(child.pid)) { + return { child, error: new Error("missing worker pid") }; + } + return { child, error: null }; } function enqueueBackgroundTask(cwd, job, request) { const { logFile } = createTrackedProgress(job); appendLogLine(logFile, "Queued for background execution."); - const child = spawnDetachedTaskWorker(cwd, job.id); + // The queued record is written before spawning so the detached worker can always + // load its request, and foreground callers can wait on the job immediately. const queuedRecord = { ...job, status: "queued", phase: "queued", - pid: child.pid ?? null, + pid: null, logFile, request }; - writeJobFile(job.workspaceRoot, job.id, queuedRecord); - upsertJob(job.workspaceRoot, queuedRecord); + persistQueuedTaskRecord(job.workspaceRoot, queuedRecord); + + // PR #346 review: /codex:cancel can land after the durable queued record is + // written but before worker launch; re-read the job and do not spawn a worker + // for a record that is already terminal. + const latestQueuedRecord = readStoredJob(job.workspaceRoot, job.id) ?? queuedRecord; + if (!isActiveJobStatus(latestQueuedRecord.status)) { + appendLogLine(logFile, `Skipped background worker launch because job is ${latestQueuedRecord.status}.`); + return { + payload: { + jobId: job.id, + status: latestQueuedRecord.status ?? "cancelled", + title: job.title, + summary: job.summary, + logFile + }, + logFile + }; + } + + const recordLaunchFailure = (error) => + markTaskLaunchFailed(job.workspaceRoot, job.id, formatSpawnFailureMessage(error), logFile); + const { child, error: spawnError } = spawnDetachedTaskWorker(cwd, job.id, { + onError: recordLaunchFailure + }); + if (spawnError || !child) { + const failedJob = recordLaunchFailure(spawnError ?? new Error("missing child process")); + return { + payload: { + jobId: job.id, + status: failedJob?.status ?? "failed", + title: job.title, + summary: job.summary, + logFile, + errorMessage: failedJob?.errorMessage ?? formatSpawnFailureMessage(spawnError) + }, + logFile + }; + } + + const { shouldKillWorker, launchStatus } = persistSpawnedTaskWorkerPid(job.workspaceRoot, job.id, child.pid); + if (shouldKillWorker) { + appendLogLine(logFile, `Terminating background worker because job is ${launchStatus}.`); + terminateProcessTree(child.pid); + } return { payload: { jobId: job.id, - status: "queued", + status: launchStatus, title: job.title, summary: job.summary, logFile @@ -679,6 +967,137 @@ function enqueueBackgroundTask(cwd, job, request) { }; } +function buildWorkerExitedError(jobId) { + // The worker drives the turn, but the shared app-server applies file edits, so + // edits in flight when the worker died can still land AFTER this point. Carry + // the verify-on-disk discipline in the message so callers re-check rather than + // trust a single (possibly mid-write) snapshot. + return `background worker exited before completing; check /codex:status ${jobId}. Files it was editing may have already landed or may still be landing via the shared app-server — re-check disk state (\`git status\`) before concluding, and do not trust a single snapshot.`; +} + +function failActiveWorkerJob(workspaceRoot, job, errorMessage) { + // Re-read before marking failed so a concurrent cancellation or completion is + // not overwritten by the foreground waiter. + const latestStoredJob = readStoredJob(workspaceRoot, job.id) ?? {}; + const latestJob = { + ...job, + ...latestStoredJob + }; + if (!isActiveJobStatus(latestJob.status)) { + return latestJob; + } + + const completedAt = nowIso(); + const failedJob = { + ...latestJob, + status: "failed", + phase: "failed", + pid: null, + completedAt, + errorMessage + }; + + writeJobFile(workspaceRoot, job.id, failedJob); + upsertJob(workspaceRoot, { + id: job.id, + status: "failed", + phase: "failed", + pid: null, + completedAt, + errorMessage + }); + appendLogLine(failedJob.logFile, errorMessage); + return failedJob; +} + +function ensureTrailingNewline(value) { + const output = String(value ?? ""); + return output.endsWith("\n") ? output : `${output}\n`; +} + +function renderStoredTaskWorkerResult(job, storedJob) { + if (typeof storedJob?.rendered === "string" && storedJob.rendered) { + return ensureTrailingNewline(storedJob.rendered); + } + return renderStoredJobResult(job, storedJob); +} + +function resolveStoredTaskExitStatus(job, storedJob) { + if (job.status === "completed") { + return 0; + } + + const storedStatus = Number(storedJob?.result?.status); + if (Number.isInteger(storedStatus) && storedStatus !== 0) { + return storedStatus; + } + + return 1; +} + +function buildStoredTaskWorkerPayload(job, storedJob) { + const hasStoredResult = + storedJob?.result && typeof storedJob.result === "object" && !Array.isArray(storedJob.result); + const storedResult = hasStoredResult ? storedJob.result : {}; + return { + ...storedResult, + job, + storedJob + }; +} + +async function runForegroundTaskWorker(cwd, job, request, options = {}) { + enqueueBackgroundTask(cwd, job, request); + + // Foreground tasks run in the same detached worker as background tasks, then wait inline for + // the stored result so Bash auto-backgrounding and subagent teardown do not kill the Codex turn. + const snapshot = await waitForSingleJobSnapshot(cwd, job.id, { + // PR #346 review: foreground xhigh waits must survive beyond the 240s status default. + timeoutMs: Infinity, + pollIntervalMs: FOREGROUND_TASK_POLL_INTERVAL_MS, + retryMissingJob: true, + // PR #346 review: an unbounded wait must still fail fast when its detached + // worker pid has exited while the job remains queued/running. + failWhenWorkerExits: true + }); + if (snapshot.workerExited) { + const errorMessage = buildWorkerExitedError(snapshot.job.id); + const failedJob = failActiveWorkerJob(snapshot.workspaceRoot, snapshot.job, errorMessage); + if (options.json) { + outputCommandResult({ job: failedJob, errorMessage }, `${errorMessage}\n`, true); + } else { + process.stderr.write(`${errorMessage}\n`); + } + process.exitCode = 1; + return; + } + if (snapshot.waitTimedOut) { + outputCommandResult(snapshot, renderJobStatusReport(snapshot.job), options.json); + process.exitCode = 1; + return; + } + + const storedJob = readStoredJob(snapshot.workspaceRoot, snapshot.job.id); + const payload = buildStoredTaskWorkerPayload(snapshot.job, storedJob); + const exitStatus = resolveStoredTaskExitStatus(snapshot.job, storedJob); + const rendered = renderStoredTaskWorkerResult(snapshot.job, storedJob); + + if ( + snapshot.job.status === "failed" && + !storedJob?.rendered && + storedJob?.errorMessage && + !options.json + ) { + process.stderr.write(`${storedJob.errorMessage}\n`); + } else { + outputCommandResult(payload, rendered, options.json); + } + + if (exitStatus !== 0) { + process.exitCode = exitStatus; + } +} + async function handleReviewCommand(argv, config) { const { options, positionals } = parseCommandInput(argv, { valueOptions: ["base", "scope", "model", "cwd"], @@ -732,7 +1151,7 @@ async function handleReview(argv) { async function handleTask(argv) { const { options, positionals } = parseCommandInput(argv, { valueOptions: ["model", "effort", "cwd", "prompt-file"], - booleanOptions: ["json", "write", "resume-last", "resume", "fresh", "background"], + booleanOptions: ["json", "write", "resume-last", "resume", "fresh", "background", "wait"], aliasMap: { m: "model" } @@ -749,6 +1168,9 @@ async function handleTask(argv) { if (resumeLast && fresh) { throw new Error("Choose either --resume/--resume-last or --fresh."); } + if (options.background && options.wait) { + throw new Error("Choose either --background or --wait, not both."); + } const write = Boolean(options.write); const taskMetadata = buildTaskRunMetadata({ prompt, @@ -771,23 +1193,34 @@ async function handleTask(argv) { }); const { payload } = enqueueBackgroundTask(cwd, job, request); outputCommandResult(payload, renderQueuedTaskLaunch(payload), options.json); + // PR #346 review: exit non-zero unless a live background job was actually + // dispatched (queued/running). A terminal launch — `failed`, or `cancelled` + // when a concurrent cancel won before worker spawn — means there is no job to + // poll, so callers keying off exit status must not treat it as a successful + // dispatch (they would skip retry/escalation). + if (!isActiveJobStatus(payload.status)) { + process.exitCode = 1; + } return; } + ensureCodexAvailable(cwd); + requireTaskRequest(prompt, resumeLast); + const job = buildTaskJob(workspaceRoot, taskMetadata, write); - await runForegroundCommand( + const request = buildTaskRequest({ + cwd, + model, + effort, + prompt, + write, + resumeLast, + jobId: job.id + }); + await runForegroundTaskWorker( + cwd, job, - (progress) => - executeTaskRun({ - cwd, - model, - effort, - prompt, - write, - resumeLast, - jobId: job.id, - onProgress: progress - }), + request, { json: options.json } ); } @@ -837,6 +1270,16 @@ async function handleTaskWorker(argv) { ); } +function reconcileExitedStatusWorker(snapshot) { + if (!snapshot.workerExited && !hasExitedActiveWorker(snapshot.job)) { + return snapshot.job; + } + + // Watcher dead-worker reliability fix: status/status --wait are the first + // readers that can convert a stale running worker into a durable failure. + return failActiveWorkerJob(snapshot.workspaceRoot, snapshot.job, buildWorkerExitedError(snapshot.job.id)); +} + async function handleStatus(argv) { const { options, positionals } = parseCommandInput(argv, { valueOptions: ["cwd", "timeout-ms", "poll-interval-ms"], @@ -849,10 +1292,22 @@ async function handleStatus(argv) { const snapshot = options.wait ? await waitForSingleJobSnapshot(cwd, reference, { timeoutMs: options["timeout-ms"], - pollIntervalMs: options["poll-interval-ms"] + pollIntervalMs: options["poll-interval-ms"], + // Watcher dead-worker reliability fix: the background completion + // watcher reaches detached workers through status --wait. + failWhenWorkerExits: true }) : buildSingleJobSnapshot(cwd, reference); - outputCommandResult(snapshot, renderJobStatusReport(snapshot.job), options.json); + if (snapshot.workerExited) { + const failedJob = reconcileExitedStatusWorker(snapshot); + outputCommandResult({ ...snapshot, job: failedJob }, renderJobStatusReport(failedJob), options.json); + return; + } + + // Watcher dead-worker reliability fix: a plain status read also repairs + // stale running state for result, hooks, and later status callers. + const reconciledJob = reconcileExitedStatusWorker(snapshot); + outputCommandResult({ ...snapshot, job: reconciledJob }, renderJobStatusReport(reconciledJob), options.json); return; } @@ -944,27 +1399,35 @@ async function handleCancel(argv) { appendLogLine(job.logFile, "Cancelled by user."); const completedAt = nowIso(); - const nextJob = { - ...job, - status: "cancelled", - phase: "cancelled", - pid: null, - completedAt, - errorMessage: "Cancelled by user." - }; + let nextJob = null; + // PR #346 review: cancellation is the terminal launch/cancel transition, so + // the state row and stored job file are updated from one read-modify-write + // instead of a stale stored-job write followed by a separate state patch. + updateState(workspaceRoot, (state) => { + const jobIndex = state.jobs.findIndex((candidate) => candidate.id === job.id); + const stateJob = jobIndex === -1 ? job : state.jobs[jobIndex]; + const latestStoredJob = readStoredJob(workspaceRoot, job.id) ?? existing; + nextJob = { + ...latestStoredJob, + ...stateJob, + status: "cancelled", + phase: "cancelled", + pid: null, + completedAt, + updatedAt: completedAt, + errorMessage: "Cancelled by user." + }; - writeJobFile(workspaceRoot, job.id, { - ...existing, - ...nextJob, - cancelledAt: completedAt - }); - upsertJob(workspaceRoot, { - id: job.id, - status: "cancelled", - phase: "cancelled", - pid: null, - errorMessage: "Cancelled by user.", - completedAt + if (jobIndex === -1) { + state.jobs.unshift(nextJob); + } else { + state.jobs[jobIndex] = nextJob; + } + + writeJobFile(workspaceRoot, job.id, { + ...nextJob, + cancelledAt: completedAt + }); }); const payload = { diff --git a/plugins/codex/scripts/codex-rescue-completion-hook.mjs b/plugins/codex/scripts/codex-rescue-completion-hook.mjs new file mode 100644 index 00000000..31168501 --- /dev/null +++ b/plugins/codex/scripts/codex-rescue-completion-hook.mjs @@ -0,0 +1,239 @@ +#!/usr/bin/env node + +import fs from "node:fs"; +import { fileURLToPath } from "node:url"; +import process from "node:process"; + +import { loadState } from "./lib/state.mjs"; +import { parseTaskStatusToken } from "./lib/task-status-token.mjs"; + +const COMPANION_SCRIPT_PATH = fileURLToPath(new URL("./codex-companion.mjs", import.meta.url)); +const COMPLETE_CONTEXT = + "codex-rescue has COMPLETED and exited. The text above is the final result. Do NOT wait for a notification or poll status. If it ran with --write, verify changed files with git status."; +const NEUTRAL_EXIT_CONTEXT = + "codex-rescue has exited (synchronous return — it is not running). Treat the text above as its result and verify on disk (`git status` if it ran with --write); do not wait for a notification or poll status."; +const FAILURE_CONTEXT = + "codex-rescue exited WITHOUT a success signal — it is not running, so do not wait for a notification, but the run may have failed or produced no result. Review the output above and `git status`, then re-run or escalate instead of treating it as done."; +// Known non-success statuses. Without the complete sentinel, one of these (or +// an empty body) means there is no success signal, so we must not claim the +// run succeeded (PR #346 review P1). +const FAILURE_STATUSES = new Set(["failed", "fail", "error", "errored", "cancelled", "canceled", "timed_out", "timeout", "aborted"]); +const ACTIVE_TASK_STATUSES = new Set(["queued", "running"]); +const NON_TERMINAL_AGENT_STATUSES = new Set(["async_launched", "running", "sub_agent_entered"]); +const BASH_AUTO_BACKGROUND_MARKER = "Command running in background with ID:"; +const BASH_AUTO_BACKGROUND_OUTPUT_PREFIX = "Output is being written to:"; +// PR #346 review: state fallback is only safe when this synchronous return +// contains invocation-specific evidence that the task was actually backgrounded. +const BACKGROUND_INTENT_PATTERN = + /running in (?:the\s+)?background|backgrounded|dispatched|you'?ll be notified|you will be notified|will notify|will be notified/i; + +function buildWatcherContext(jobId) { + return `codex-rescue background job ${jobId} is RUNNING — there is no automatic push notification. To be notified, arm a watcher: run this via the Bash tool with run_in_background=true: node "${COMPANION_SCRIPT_PATH}" status ${jobId} --wait --timeout-ms 1800000 — it blocks until the job is terminal, then exits and re-invokes you. If it returns and the job is still running, re-arm the same command. Do NOT treat the job as done until the watcher reports a terminal status.`; +} + +function extractBashAutoBackgroundOutputPath(text) { + const prefixIndex = text.indexOf(BASH_AUTO_BACKGROUND_OUTPUT_PREFIX); + if (prefixIndex === -1) { + return null; + } + + const afterPrefix = text.slice(prefixIndex + BASH_AUTO_BACKGROUND_OUTPUT_PREFIX.length).trim(); + const endMarkers = [". You will be notified", "\n", "\r"]; + let endIndex = afterPrefix.length; + for (const marker of endMarkers) { + const markerIndex = afterPrefix.indexOf(marker); + if (markerIndex !== -1 && markerIndex < endIndex) { + endIndex = markerIndex; + } + } + + const outputPath = afterPrefix.slice(0, endIndex).trim(); + return outputPath || null; +} + +function buildBashAutoBackgroundContext(outputPath) { + const followUp = outputPath + ? `re-check \`git status\` (if it ran with --write) and/or read the streamed output at ${outputPath}` + : "re-check `git status` (if it ran with --write)"; + return `codex-rescue's Codex run exceeded the foreground time cap and was auto-backgrounded by the Bash tool; it is STILL RUNNING detached. No completion notification will arrive. Do not wait passively — ${followUp} until the run lands, then act on the result.`; +} + +function readHookInput() { + const raw = fs.readFileSync(0, "utf8").trim(); + if (!raw) { + return {}; + } + return JSON.parse(raw); +} + +function emitAdditionalContext(additionalContext) { + process.stdout.write( + `${JSON.stringify({ + hookSpecificOutput: { + hookEventName: "PostToolUse", + additionalContext + } + })}\n` + ); +} + +function isCodexRescueAgentToolUse(input) { + if (input?.hook_event_name !== "PostToolUse" || input?.tool_name !== "Agent") { + return false; + } + + const subagentType = String( + input?.tool_input?.subagent_type ?? input?.tool_input?.subagentType ?? input?.tool_input?.agent_type ?? "" + ).toLowerCase(); + return subagentType.includes("codex-rescue"); +} + +function isSynchronousAgentReturn(toolResponse) { + const status = String(toolResponse?.status ?? "").toLowerCase(); + // PR #346 review (P2): sub_agent_entered is an interactive handoff, not a terminal + // return — the rescue agent is still active, so the completion hook must stay silent. + return !NON_TERMINAL_AGENT_STATUSES.has(status); +} + +function getHookCwd(input) { + const cwd = input?.cwd; + return typeof cwd === "string" && cwd.trim() ? cwd : process.cwd(); +} + +function compareJobsNewestFirst(left, right) { + return String(right.updatedAt ?? right.createdAt ?? "").localeCompare(String(left.updatedAt ?? left.createdAt ?? "")); +} + +function findNewestActiveTaskJob(cwd, sessionId) { + const activeTaskJobs = [...loadState(cwd).jobs].filter( + (job) => job?.jobClass === "task" && ACTIVE_TASK_STATUSES.has(String(job.status ?? "")) + ); + const normalizedSessionId = typeof sessionId === "string" && sessionId.trim() ? sessionId : null; + // Companion task jobs are session-owned; when Claude gives us the current + // session id, only infer active state from jobs created by this same session. + // Older hook inputs had no session_id, so those preserve the unscoped fallback. + const candidateJobs = normalizedSessionId + ? activeTaskJobs.filter((job) => job?.sessionId === normalizedSessionId) + : activeTaskJobs; + return candidateJobs.sort(compareJobsNewestFirst)[0] ?? null; +} + +function isFailedOrEmptyReturn(toolResponse, responseText) { + // codex-rescue returns nothing when Codex can't be invoked; an empty body or + // a known failure status means there is no success signal to report. + if (String(responseText ?? "").trim() === "") { + return true; + } + const status = String(toolResponse?.status ?? "").toLowerCase(); + return FAILURE_STATUSES.has(status); +} + +function hasFailureStatus(toolResponse) { + return FAILURE_STATUSES.has(String(toolResponse?.status ?? "").toLowerCase()); +} + +function collectText(value) { + if (value == null) { + return ""; + } + if (typeof value === "string") { + return value; + } + if (Array.isArray(value)) { + return value.map((entry) => collectText(entry)).filter(Boolean).join("\n"); + } + if (typeof value !== "object") { + return ""; + } + + const textParts = []; + if (typeof value.text === "string") { + textParts.push(value.text); + } + if (typeof value.content === "string") { + textParts.push(value.content); + } else if (Array.isArray(value.content)) { + textParts.push(collectText(value.content)); + } + if (typeof value.result === "string") { + textParts.push(value.result); + } + if (typeof value.stdout === "string") { + textParts.push(value.stdout); + } + return textParts.filter(Boolean).join("\n"); +} + +function buildCompletionContext(input) { + const toolResponse = input?.tool_response; + if (!isSynchronousAgentReturn(toolResponse)) { + return null; + } + + const responseText = collectText(toolResponse); + // PR #346 review (P1): the structured Agent tool status is authoritative over any + // [[codex-task ...]] line in the body. A failed return whose text echoes a complete/ + // dispatched token must not be reclassified as success or an active dispatch, so a + // known failure status short-circuits before token handling. + if (hasFailureStatus(toolResponse)) { + return FAILURE_CONTEXT; + } + + const token = parseTaskStatusToken(responseText); + // The complete sentinel is the only positive proof of a successful run; the + // companion stamps it solely on real completion, so it is the lone path that + // asserts success (PR #346 review P1). + if (token?.status === "complete") { + return COMPLETE_CONTEXT; + } + // Status tokens are emitted by this codex-rescue return, so they are more + // authoritative than any task state that may belong to another concurrent run. + if (token?.status === "dispatched" && token.id) { + return buildWatcherContext(token.id); + } + + // Bash auto-backgrounds commands that exceed the ~600s foreground cap. That + // synchronous Agent return means the Bash wrapper detached, not that Codex + // completed, so report the still-running state before any completion claim. + if (responseText.includes(BASH_AUTO_BACKGROUND_MARKER)) { + return buildBashAutoBackgroundContext(extractBashAutoBackgroundOutputPath(responseText)); + } + + // PR #346 review: only consult global companion state after this return says + // the task was backgrounded; otherwise an unrelated active task can hide a + // sentinel-less failure from the foreground invocation that just returned. + if (BACKGROUND_INTENT_PATTERN.test(responseText)) { + const activeTaskJob = findNewestActiveTaskJob(getHookCwd(input), input?.session_id); + if (activeTaskJob) { + return buildWatcherContext(activeTaskJob.id); + } + } + + // No success evidence: never claim success here. A failure status or an empty + // return gets a failure-aware line; everything else gets a neutral "exited, + // verify on disk" line. Every branch still says "do not wait" because a + // synchronous Agent return means the agent has exited. + if (isFailedOrEmptyReturn(toolResponse, responseText)) { + return FAILURE_CONTEXT; + } + return NEUTRAL_EXIT_CONTEXT; +} + +function main() { + const input = readHookInput(); + if (!isCodexRescueAgentToolUse(input)) { + return; + } + + const context = buildCompletionContext(input); + if (context) { + emitAdditionalContext(context); + } +} + +try { + main(); +} catch (error) { + process.stderr.write(`${error instanceof Error ? error.message : String(error)}\n`); + process.exitCode = 1; +} diff --git a/plugins/codex/scripts/lib/app-server.mjs b/plugins/codex/scripts/lib/app-server.mjs index 127c8376..3324497a 100644 --- a/plugins/codex/scripts/lib/app-server.mjs +++ b/plugins/codex/scripts/lib/app-server.mjs @@ -183,6 +183,8 @@ class SpawnedCodexAppServerClient extends AppServerClientBase { constructor(cwd, options = {}) { super(cwd, options); this.transport = "direct"; + // Holds the per-child process exit listener so close() and child shutdown can remove it exactly once. + this._exitReaper = null; } async initialize() { @@ -194,6 +196,34 @@ class SpawnedCodexAppServerClient extends AppServerClientBase { windowsHide: true }); + // Leak fix: a non-detached codex app-server child is NOT reaped by the OS when + // this host process exits normally. Guarantee we kill OUR OWN child on host exit + // (exit handlers must be synchronous). We never scan/kill processes we did not + // spawn -- parallel sessions own their own app-servers. + this._exitReaper = () => { + try { + if (this.proc && this.proc.exitCode === null && !this.proc.killed) { + if (process.platform === "win32") { + // PR #346 review (P1): with shell:true, this.proc is cmd.exe and the + // real codex app-server is its child; a flat SIGKILL would orphan it. + // terminateProcessTree is synchronous (taskkill /T /F), so it is safe + // in an exit handler and kills the whole tree. + terminateProcessTree(this.proc.pid); + } else { + // Direct (non-shell) child: SIGKILL it; the orphan is reparented to + // PID 1 and reaped by init (launchd/systemd) — see the reply on the + // P2 zombie note for the no-reaper-container caveat. + this.proc.kill("SIGKILL"); + } + } + } catch { + // best-effort during host teardown + } + }; + process.once("exit", this._exitReaper); + // Do not add SIGTERM/SIGINT handlers here: non-exiting signal listeners would keep + // plain CLI hosts alive, while existing process-tree shutdown already covers signals. + this.proc.stdout.setEncoding("utf8"); this.proc.stderr.setEncoding("utf8"); @@ -202,10 +232,12 @@ class SpawnedCodexAppServerClient extends AppServerClientBase { }); this.proc.on("error", (error) => { + this.removeExitReaper(); this.handleExit(error); }); this.proc.on("exit", (code, signal) => { + this.removeExitReaper(); const detail = code === 0 ? null @@ -225,9 +257,20 @@ class SpawnedCodexAppServerClient extends AppServerClientBase { this.notify("initialized", {}); } + removeExitReaper() { + // The exit listener belongs to this spawned child only; removing it prevents + // listener buildup in long-lived hosts after the child has already stopped. + if (!this._exitReaper) { + return; + } + process.removeListener("exit", this._exitReaper); + this._exitReaper = null; + } + async close() { if (this.closed) { await this.exitPromise; + this.removeExitReaper(); return; } @@ -259,6 +302,7 @@ class SpawnedCodexAppServerClient extends AppServerClientBase { } await this.exitPromise; + this.removeExitReaper(); } sendMessage(message) { diff --git a/plugins/codex/scripts/lib/codex.mjs b/plugins/codex/scripts/lib/codex.mjs index f2fe88bd..3c101fc4 100644 --- a/plugins/codex/scripts/lib/codex.mjs +++ b/plugins/codex/scripts/lib/codex.mjs @@ -77,8 +77,15 @@ function buildResumeParams(threadId, cwd, options = {}) { } /** @returns {UserInput[]} */ -function buildTurnInput(prompt) { - return [{ type: "text", text: prompt, text_elements: [] }]; +function buildTurnInput(prompt, options = {}) { + const input = []; + const boundaryNote = String(options.boundaryNote ?? "").trim(); + if (boundaryNote) { + // The workspace-boundary notice is ephemeral and exists only in the turn input sent to Codex. + input.push({ type: "text", text: boundaryNote, text_elements: [] }); + } + input.push({ type: "text", text: prompt, text_elements: [] }); + return input; } function shorten(text, limit = 72) { @@ -1004,7 +1011,7 @@ export async function runAppServerTurn(cwd, options = {}) { () => client.request("turn/start", { threadId, - input: buildTurnInput(prompt), + input: buildTurnInput(prompt, { boundaryNote: options.boundaryNote }), model: options.model ?? null, effort: options.effort ?? null, outputSchema: options.outputSchema ?? null diff --git a/plugins/codex/scripts/lib/render.mjs b/plugins/codex/scripts/lib/render.mjs index 2ec18523..7bd62cee 100644 --- a/plugins/codex/scripts/lib/render.mjs +++ b/plugins/codex/scripts/lib/render.mjs @@ -1,3 +1,8 @@ +import { + appendTaskStatusToken, + buildTaskCompleteStatusToken +} from "./task-status-token.mjs"; + function severityRank(severity) { switch (severity) { case "critical": @@ -314,12 +319,15 @@ export function renderNativeReviewResult(result, meta) { export function renderTaskResult(parsedResult, meta) { const rawOutput = typeof parsedResult?.rawOutput === "string" ? parsedResult.rawOutput : ""; + const completeToken = buildTaskCompleteStatusToken(); + // PR #346 review: only successful Codex task exits may stamp the success sentinel. + const succeeded = parsedResult?.status === 0 || parsedResult?.exitStatus === 0 || meta?.success === true; if (rawOutput) { - return rawOutput.endsWith("\n") ? rawOutput : `${rawOutput}\n`; + return succeeded ? appendTaskStatusToken(rawOutput, completeToken) : rawOutput.endsWith("\n") ? rawOutput : `${rawOutput}\n`; } const message = String(parsedResult?.failureMessage ?? "").trim() || "Codex did not return a final message."; - return `${message}\n`; + return succeeded ? appendTaskStatusToken(message, completeToken) : `${message}\n`; } export function renderStatusReport(report) { diff --git a/plugins/codex/scripts/lib/task-status-token.mjs b/plugins/codex/scripts/lib/task-status-token.mjs new file mode 100644 index 00000000..d97d275e --- /dev/null +++ b/plugins/codex/scripts/lib/task-status-token.mjs @@ -0,0 +1,45 @@ +const DISPATCHED_JOB_ID_PATTERN = /^[A-Za-z0-9._:-]+$/; +const DISPATCHED_TOKEN_PATTERN = /^\[\[codex-task status=dispatched id=([A-Za-z0-9._:-]+)\]\]$/; + +export const TASK_COMPLETE_STATUS_TOKEN = "[[codex-task status=complete]]"; + +// The companion and PostToolUse hook share this standalone-line sentinel scheme. +// A line is machine-readable only when it exactly matches one of these forms: +// [[codex-task status=complete]] +// [[codex-task status=dispatched id=]] +// The fixed prefix, status key, and closing brackets make the marker greppable +// and keep normal Codex prose from being treated as authoritative state. +export function buildTaskCompleteStatusToken() { + return TASK_COMPLETE_STATUS_TOKEN; +} + +export function buildTaskDispatchedStatusToken(jobId) { + const normalizedJobId = String(jobId ?? "").trim(); + if (!DISPATCHED_JOB_ID_PATTERN.test(normalizedJobId)) { + throw new Error(`Invalid Codex task job id for status token: ${normalizedJobId || "(empty)"}`); + } + return `[[codex-task status=dispatched id=${normalizedJobId}]]`; +} + +export function appendTaskStatusToken(output, token) { + const text = String(output ?? ""); + const normalizedOutput = text === "" || text.endsWith("\n") ? text : `${text}\n`; + return `${normalizedOutput}${token}\n`; +} + +export function parseTaskStatusToken(output) { + let parsed = null; + for (const line of String(output ?? "").split(/\r?\n/)) { + const trimmedLine = line.trim(); + if (trimmedLine === TASK_COMPLETE_STATUS_TOKEN) { + parsed = { status: "complete", id: null }; + continue; + } + + const dispatchedMatch = trimmedLine.match(DISPATCHED_TOKEN_PATTERN); + if (dispatchedMatch) { + parsed = { status: "dispatched", id: dispatchedMatch[1] }; + } + } + return parsed; +} diff --git a/plugins/codex/skills/codex-cli-runtime/SKILL.md b/plugins/codex/skills/codex-cli-runtime/SKILL.md index 0e91bfb5..7e571435 100644 --- a/plugins/codex/skills/codex-cli-runtime/SKILL.md +++ b/plugins/codex/skills/codex-cli-runtime/SKILL.md @@ -25,7 +25,10 @@ Execution rules: Command selection: - Use exactly one `task` invocation per rescue handoff. -- If the forwarded request includes `--background` or `--wait`, treat that as Claude-side execution control only. Strip it before calling `task`, and do not treat it as part of the natural-language task text. +- If the forwarded request includes `--background`, pass that execution mode through by invoking `task --background`; strip the flag token only from the natural-language prompt text. +- If the forwarded request includes `--wait`, invoke foreground `task`; do not pass `--wait` to `task`, and strip the flag token only from the natural-language prompt text. +- Do not drop an explicit execution mode while stripping tokens from prompt text. +- Foreground task runs are capped by the Bash tool at roughly 600 seconds; past that, they can be auto-backgrounded and orphaned, so long work belongs in `task --background`. - If the forwarded request includes `--model`, normalize `spark` to `gpt-5.3-codex-spark` and pass it through to `task`. - If the forwarded request includes `--effort`, pass it through to `task`. - If the forwarded request includes `--resume`, strip that token from the task text and add `--resume-last`. diff --git a/plugins/codex/skills/codex-result-handling/SKILL.md b/plugins/codex/skills/codex-result-handling/SKILL.md index e1896548..d29f0e47 100644 --- a/plugins/codex/skills/codex-result-handling/SKILL.md +++ b/plugins/codex/skills/codex-result-handling/SKILL.md @@ -16,6 +16,9 @@ When the helper returns Codex output: - If Codex made edits, say so explicitly and list the touched files when the helper provides them. - For `codex:codex-rescue`, do not turn a failed or incomplete Codex run into a Claude-side implementation attempt. Report the failure and stop. - For `codex:codex-rescue`, if Codex was never successfully invoked, do not generate a substitute answer at all. +- For `codex:codex-rescue`, treat a standalone `[[codex-task status=complete]]` sentinel, or the matching PostToolUse hook context, as authoritative proof that the synchronous subagent has finished and exited. Do not wait for a notification or poll status after that signal. +- For `codex:codex-rescue`, treat a standalone `[[codex-task status=dispatched id=]]` sentinel as a background-dispatch signal, not a completion signal. No automatic push notification will arrive; arm `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" status --wait --timeout-ms 1800000` via the Bash tool with `run_in_background=true` to be re-invoked on terminal status. If it returns while the job is still running, re-arm the same command. Never treat a dispatched/background job as done until that watcher returns a terminal status. - CRITICAL: After presenting review findings, STOP. Do not make any code changes. Do not fix any issues. You MUST explicitly ask the user which issues, if any, they want fixed before touching a single file. Auto-applying fixes from a review is strictly forbidden, even if the fix is obvious. - If the helper reports malformed output or a failed Codex run, include the most actionable stderr lines and stop there instead of guessing. - If the helper reports that setup or authentication is required, direct the user to `/codex:setup` and do not improvise alternate auth flows. +- The Claude Code harness appends an `agentId: (use SendMessage with to: '' to continue this agent)` line to every `Agent(codex:codex-rescue)` return. This is a resume token for continuing the same agent thread, NOT a "still running" or "in background" status signal. By the time you see it, the agent has finished and the work is done. Do not paraphrase it as "Codex is running in background" or "task dispatched async" — read the stdout above the suffix for the real completion state, and verify with `git status --short` if the agent had `--write`. diff --git a/tests/codex-rescue-hook.test.mjs b/tests/codex-rescue-hook.test.mjs new file mode 100644 index 00000000..5567481e --- /dev/null +++ b/tests/codex-rescue-hook.test.mjs @@ -0,0 +1,564 @@ +import path from "node:path"; +import test from "node:test"; +import assert from "node:assert/strict"; +import { fileURLToPath } from "node:url"; + +import { makeTempDir, run } from "./helpers.mjs"; +import { saveState } from "../plugins/codex/scripts/lib/state.mjs"; + +const ROOT = path.resolve(path.dirname(fileURLToPath(import.meta.url)), ".."); +const HOOK = path.join(ROOT, "plugins", "codex", "scripts", "codex-rescue-completion-hook.mjs"); +const COMPANION = path.join(ROOT, "plugins", "codex", "scripts", "codex-companion.mjs"); + +function withPluginData(pluginDataDir, callback) { + const previousPluginDataDir = process.env.CLAUDE_PLUGIN_DATA; + process.env.CLAUDE_PLUGIN_DATA = pluginDataDir; + try { + return callback(); + } finally { + if (previousPluginDataDir == null) { + delete process.env.CLAUDE_PLUGIN_DATA; + } else { + process.env.CLAUDE_PLUGIN_DATA = previousPluginDataDir; + } + } +} + +function seedHookState(cwd, pluginDataDir, jobs) { + withPluginData(pluginDataDir, () => + saveState(cwd, { + version: 1, + config: { stopReviewGate: false }, + jobs + }) + ); +} + +function runHook(input, options = {}) { + const pluginDataDir = options.pluginDataDir ?? makeTempDir(); + return run("node", [HOOK], { + cwd: ROOT, + env: { + ...process.env, + ...(options.env ?? {}), + CLAUDE_PLUGIN_DATA: pluginDataDir + }, + input: JSON.stringify(input) + }); +} + +function parseHookOutput(result) { + assert.equal(result.status, 0, result.stderr); + assert.ok(result.stdout.trim(), "expected hook to emit JSON"); + return JSON.parse(result.stdout); +} + +function expectedWatcherContext(jobId) { + return `codex-rescue background job ${jobId} is RUNNING — there is no automatic push notification. To be notified, arm a watcher: run this via the Bash tool with run_in_background=true: node "${COMPANION}" status ${jobId} --wait --timeout-ms 1800000 — it blocks until the job is terminal, then exits and re-invokes you. If it returns and the job is still running, re-arm the same command. Do NOT treat the job as done until the watcher reports a terminal status.`; +} + +test("codex-rescue hook emits a watcher when this return has background phrasing and active task state", () => { + const workspace = makeTempDir(); + const pluginDataDir = makeTempDir(); + seedHookState(workspace, pluginDataDir, [ + { + id: "task-queued-old", + status: "queued", + title: "Codex Task", + jobClass: "task", + sessionId: "session-main", + updatedAt: "2026-05-25T10:00:00.000Z" + }, + { + id: "review-running-newer", + status: "running", + title: "Codex Review", + jobClass: "review", + updatedAt: "2026-05-25T10:10:00.000Z" + }, + { + id: "task-running-new", + status: "running", + title: "Codex Task", + jobClass: "task", + sessionId: "session-main", + updatedAt: "2026-05-25T10:05:00.000Z" + } + ]); + + const result = runHook( + { + hook_event_name: "PostToolUse", + tool_name: "Agent", + session_id: "session-main", + cwd: workspace, + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "completed", + agentId: "agent-paraphrased-dispatch", + content: [ + { + type: "text", + text: "Codex is running in background (ID task-running-new)." + } + ] + } + }, + { pluginDataDir } + ); + + const payload = parseHookOutput(result); + assert.deepEqual(payload, { + hookSpecificOutput: { + hookEventName: "PostToolUse", + additionalContext: expectedWatcherContext("task-running-new") + } + }); + assert.ok(payload.hookSpecificOutput.additionalContext.includes(`node "${COMPANION}" status task-running-new`)); +}); + +test("codex-rescue hook does not infer active state without background evidence in this return", () => { + const workspace = makeTempDir(); + const pluginDataDir = makeTempDir(); + seedHookState(workspace, pluginDataDir, [ + { + id: "task-unrelated-running", + status: "running", + title: "Codex Task", + jobClass: "task", + sessionId: "session-main", + updatedAt: "2026-05-25T10:05:00.000Z" + } + ]); + + const result = runHook( + { + hook_event_name: "PostToolUse", + tool_name: "Agent", + session_id: "session-main", + cwd: workspace, + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "failed", + agentId: "agent-sentinel-less-failure", + content: [ + { + type: "text", + text: "Codex could not be invoked.\n" + } + ] + } + }, + { pluginDataDir } + ); + + const payload = parseHookOutput(result); + assert.equal( + payload.hookSpecificOutput.additionalContext, + "codex-rescue exited WITHOUT a success signal — it is not running, so do not wait for a notification, but the run may have failed or produced no result. Review the output above and `git status`, then re-run or escalate instead of treating it as done." + ); + assert.doesNotMatch(payload.hookSpecificOutput.additionalContext, /background job task-unrelated-running/); +}); + +test("codex-rescue hook injects a complete line when the completion token is present", () => { + const result = runHook({ + hook_event_name: "PostToolUse", + tool_name: "Agent", + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "completed", + agentId: "agent-complete", + content: [ + { + type: "text", + text: "Handled the requested task.\n[[codex-task status=complete]]\n" + } + ] + } + }); + + const payload = parseHookOutput(result); + assert.deepEqual(payload, { + hookSpecificOutput: { + hookEventName: "PostToolUse", + additionalContext: + "codex-rescue has COMPLETED and exited. The text above is the final result. Do NOT wait for a notification or poll status. If it ran with --write, verify changed files with git status." + } + }); +}); + +test("codex-rescue hook reports failure (not success) when a failed return echoes the complete token", () => { + const result = runHook({ + hook_event_name: "PostToolUse", + tool_name: "Agent", + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "failed", + agentId: "agent-failed-echo-complete", + content: [ + { + type: "text", + text: "The original prompt mentioned this literal token:\n[[codex-task status=complete]]\nThe run still failed.\n" + } + ] + } + }); + + const payload = parseHookOutput(result); + const additionalContext = payload.hookSpecificOutput.additionalContext; + assert.match(additionalContext, /exited WITHOUT a success signal/); + assert.doesNotMatch(additionalContext, /COMPLETED and exited/); +}); + +test("codex-rescue hook treats the completion token as authoritative over active task state", () => { + const workspace = makeTempDir(); + const pluginDataDir = makeTempDir(); + seedHookState(workspace, pluginDataDir, [ + { + id: "task-unrelated-running", + status: "running", + title: "Codex Task", + jobClass: "task", + sessionId: "other-session", + updatedAt: "2026-05-25T10:20:00.000Z" + } + ]); + + const result = runHook( + { + hook_event_name: "PostToolUse", + tool_name: "Agent", + cwd: workspace, + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "completed", + agentId: "agent-complete-with-unrelated-active-job", + content: [ + { + type: "text", + text: "Handled the requested task.\n[[codex-task status=complete]]\n" + } + ] + } + }, + { pluginDataDir } + ); + + const payload = parseHookOutput(result); + assert.equal( + payload.hookSpecificOutput.additionalContext, + "codex-rescue has COMPLETED and exited. The text above is the final result. Do NOT wait for a notification or poll status. If it ran with --write, verify changed files with git status." + ); + assert.doesNotMatch(payload.hookSpecificOutput.additionalContext, /background job task-unrelated-running/); +}); + +test("codex-rescue hook scopes background-evidenced active task state to the current session", () => { + const workspace = makeTempDir(); + const pluginDataDir = makeTempDir(); + seedHookState(workspace, pluginDataDir, [ + { + id: "task-current-session", + status: "running", + title: "Codex Task", + jobClass: "task", + sessionId: "session-current", + updatedAt: "2026-05-25T10:00:00.000Z" + }, + { + id: "task-other-session-newer", + status: "running", + title: "Codex Task", + jobClass: "task", + sessionId: "session-other", + updatedAt: "2026-05-25T10:30:00.000Z" + } + ]); + + const scopedResult = runHook( + { + hook_event_name: "PostToolUse", + tool_name: "Agent", + session_id: "session-current", + cwd: workspace, + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "completed", + agentId: "agent-tokenless-current-session", + content: [ + { + type: "text", + text: "Codex is still running in the background." + } + ] + } + }, + { pluginDataDir } + ); + + const scopedPayload = parseHookOutput(scopedResult); + assert.equal(scopedPayload.hookSpecificOutput.additionalContext, expectedWatcherContext("task-current-session")); + assert.doesNotMatch(scopedPayload.hookSpecificOutput.additionalContext, /task-other-session-newer/); + + const unscopedResult = runHook( + { + hook_event_name: "PostToolUse", + tool_name: "Agent", + cwd: workspace, + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "completed", + agentId: "agent-tokenless-without-session", + content: [ + { + type: "text", + text: "Codex is still running in the background." + } + ] + } + }, + { pluginDataDir } + ); + + const unscopedPayload = parseHookOutput(unscopedResult); + assert.equal(unscopedPayload.hookSpecificOutput.additionalContext, expectedWatcherContext("task-other-session-newer")); +}); + +test("codex-rescue hook reports synchronous tokenless returns neutrally", () => { + const result = runHook({ + hook_event_name: "PostToolUse", + tool_name: "Agent", + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "completed", + agentId: "agent-tokenless", + content: [ + { + type: "text", + text: "Handled the requested task.\n" + } + ] + } + }); + + const payload = parseHookOutput(result); + assert.equal( + payload.hookSpecificOutput.additionalContext, + "codex-rescue has exited (synchronous return — it is not running). Treat the text above as its result and verify on disk (`git status` if it ran with --write); do not wait for a notification or poll status." + ); + assert.doesNotMatch(payload.hookSpecificOutput.additionalContext, /final result/); +}); + +test("codex-rescue hook reports failed tokenless returns without a success signal", () => { + const result = runHook({ + hook_event_name: "PostToolUse", + tool_name: "Agent", + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "failed", + agentId: "agent-failed", + content: [ + { + type: "text", + text: "Codex could not be invoked.\n" + } + ] + } + }); + + const payload = parseHookOutput(result); + assert.equal( + payload.hookSpecificOutput.additionalContext, + "codex-rescue exited WITHOUT a success signal — it is not running, so do not wait for a notification, but the run may have failed or produced no result. Review the output above and `git status`, then re-run or escalate instead of treating it as done." + ); +}); + +test("codex-rescue hook stays silent on sub_agent_entered handoff", () => { + const result = runHook({ + hook_event_name: "PostToolUse", + tool_name: "Agent", + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "sub_agent_entered", + agentId: "agent-interactive-handoff", + content: [ + { + type: "text", + text: "codex-rescue entered an interactive subagent handoff.\n" + } + ] + } + }); + + assert.equal(result.status, 0, result.stderr); + assert.equal(result.stdout, ""); +}); + +test("codex-rescue hook reports empty tokenless returns without a success signal", () => { + const result = runHook({ + hook_event_name: "PostToolUse", + tool_name: "Agent", + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "completed", + agentId: "agent-empty", + content: [ + { + type: "text", + text: "" + } + ] + } + }); + + const payload = parseHookOutput(result); + assert.equal( + payload.hookSpecificOutput.additionalContext, + "codex-rescue exited WITHOUT a success signal — it is not running, so do not wait for a notification, but the run may have failed or produced no result. Review the output above and `git status`, then re-run or escalate instead of treating it as done." + ); +}); + +test("codex-rescue hook reports Bash auto-background returns as still running detached", () => { + const workspace = makeTempDir(); + const pluginDataDir = makeTempDir(); + seedHookState(workspace, pluginDataDir, [ + { + id: "task-unrelated-running", + status: "running", + title: "Codex Task", + jobClass: "task", + sessionId: "session-main", + updatedAt: "2026-05-25T10:05:00.000Z" + } + ]); + + const result = runHook( + { + hook_event_name: "PostToolUse", + tool_name: "Agent", + session_id: "session-main", + cwd: workspace, + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "completed", + agentId: "agent-auto-backgrounded", + content: [ + { + type: "text", + text: + "Command running in background with ID: bash-123. Output is being written to: /tmp/codex-rescue-output.log. You will be notified when it completes. To check interim output, use Read on that file path." + } + ] + } + }, + { pluginDataDir } + ); + + const payload = parseHookOutput(result); + assert.equal( + payload.hookSpecificOutput.additionalContext, + "codex-rescue's Codex run exceeded the foreground time cap and was auto-backgrounded by the Bash tool; it is STILL RUNNING detached. No completion notification will arrive. Do not wait passively — re-check `git status` (if it ran with --write) and/or read the streamed output at /tmp/codex-rescue-output.log until the run lands, then act on the result." + ); + assert.doesNotMatch(payload.hookSpecificOutput.additionalContext, /COMPLETED and exited/); + assert.doesNotMatch(payload.hookSpecificOutput.additionalContext, /background job task-unrelated-running/); +}); + +test("codex-rescue hook injects a dispatched line when the dispatched token is present", () => { + const result = runHook({ + hook_event_name: "PostToolUse", + tool_name: "Agent", + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "completed", + agentId: "agent-dispatched", + content: [ + { + type: "text", + text: "[[codex-task status=dispatched id=task-abc123]]\nCodex Task dispatched as background job task-abc123.\n" + } + ] + } + }); + + const payload = parseHookOutput(result); + assert.deepEqual(payload, { + hookSpecificOutput: { + hookEventName: "PostToolUse", + additionalContext: expectedWatcherContext("task-abc123") + } + }); +}); + +test("codex-rescue hook reports failure (not dispatch) when a failed return echoes a dispatched token", () => { + const result = runHook({ + hook_event_name: "PostToolUse", + tool_name: "Agent", + tool_input: { + subagent_type: "codex:codex-rescue" + }, + tool_response: { + status: "failed", + agentId: "agent-failed-echo-dispatched", + content: [ + { + type: "text", + text: + "The model output echoed a dispatch example:\n[[codex-task status=dispatched id=task-xxxx]]\nNo worker was actually launched.\n" + } + ] + } + }); + + const payload = parseHookOutput(result); + const additionalContext = payload.hookSpecificOutput.additionalContext; + assert.match(additionalContext, /exited WITHOUT a success signal/); + assert.doesNotMatch(additionalContext, /status .* --wait/); +}); + +test("codex-rescue hook ignores other subagent types", () => { + const result = runHook({ + hook_event_name: "PostToolUse", + tool_name: "Agent", + tool_input: { + subagent_type: "superpowers:code-reviewer" + }, + tool_response: { + status: "completed", + agentId: "agent-reviewer", + content: [ + { + type: "text", + text: "[[codex-task status=complete]]\n" + } + ] + } + }); + + assert.equal(result.status, 0, result.stderr); + assert.equal(result.stdout, ""); +}); diff --git a/tests/commands.test.mjs b/tests/commands.test.mjs index 3724ffa4..66de748a 100644 --- a/tests/commands.test.mjs +++ b/tests/commands.test.mjs @@ -108,9 +108,23 @@ test("rescue command absorbs continue semantics", () => { assert.match(rescue, /AskUserQuestion/); assert.match(rescue, /Continue current Codex thread/); assert.match(rescue, /Start a new Codex thread/); - assert.match(rescue, /run the `codex:codex-rescue` subagent in the background/i); - assert.match(rescue, /default to foreground/i); - assert.match(rescue, /Do not forward them to `task`/i); + assert.match(rescue, /pass `--background` to the `codex:codex-rescue` subagent/i); + assert.match(rescue, /For long, substantial, or open-ended rescues, pass `--background`/i); + assert.match(rescue, /status --wait --timeout-ms 1800000/); + assert.match(rescue, /run_in_background=true/); + assert.match(rescue, /`\/codex:result `/); + // Fix: a foreground task call inside the subagent is auto-backgrounded and + // reaped at ~143s (kills the worker mid-run). Default must be --background. + assert.match(rescue, /default to `--background`/i); + assert.match(rescue, /Only stay foreground when `--wait` is explicit/i); + assert.match(rescue, /running inside the subagent is auto-backgrounded by the harness and then reaped/i); + assert.match(rescue, /observed at ~143s/i); + assert.doesNotMatch(rescue, /foreground rescue is capped by the Bash tool at roughly 600 seconds/i); + assert.match(agent, /Default to `task --background` whenever the user did NOT explicitly pass `--wait`/i); + assert.match(agent, /reaped when the subagent ends \(observed at ~143s\)/i); + assert.doesNotMatch(agent, /capped by the Bash tool at roughly 600 seconds/i); + assert.match(rescue, /Preserve the execution choice/i); + assert.match(rescue, /strip them only from the natural-language task text/i); assert.match(rescue, /`--model` and `--effort` are runtime-selection flags/i); assert.match(rescue, /Leave `--effort` unset unless the user explicitly asks for a specific reasoning effort/i); assert.match(rescue, /If they ask for `spark`, map it to `gpt-5\.3-codex-spark`/i); @@ -121,13 +135,19 @@ test("rescue command absorbs continue semantics", () => { assert.match(rescue, /thin forwarder only/i); assert.match(rescue, /Return the Codex companion stdout verbatim to the user/i); assert.match(rescue, /Do not paraphrase, summarize, rewrite, or add commentary before or after it/i); + assert.match(rescue, /\[\[codex-task status=complete\]\]/); + assert.match(rescue, /\[\[codex-task status=dispatched id=\]\]/); + assert.match(rescue, /Trust this sentinel, companion state, and the PostToolUse hook over any prose/i); assert.match(rescue, /return that command's stdout as-is/i); assert.match(rescue, /Leave `--resume` and `--fresh` in the forwarded request/i); assert.match(agent, /--resume/); assert.match(agent, /--fresh/); assert.match(agent, /thin forwarding wrapper/i); - assert.match(agent, /prefer foreground for a small, clearly bounded rescue request/i); - assert.match(agent, /If the user did not explicitly choose `--background` or `--wait` and the task looks complicated, open-ended, multi-step, or likely to keep Codex running for a long time, prefer background execution/i); + assert.match(agent, /Explicit `--background` in the request means invoke `task --background`/i); + assert.match(agent, /Explicit `--wait` means invoke foreground `task`/i); + assert.match(agent, /ONLY for a short, clearly bounded request/i); + assert.match(agent, /Default to `task --background` whenever the user did NOT explicitly pass `--wait`/i); + assert.match(agent, /never run substantial work foreground from here/i); assert.match(agent, /Use exactly one `Bash` call/i); assert.match(agent, /Do not inspect the repository, read files, grep, monitor progress, poll status, fetch results, cancel jobs, summarize output, or do any follow-up work of your own/i); assert.match(agent, /Do not call `review`, `adversarial-review`, `status`, `result`, or `cancel`/i); @@ -147,8 +167,14 @@ test("rescue command absorbs continue semantics", () => { assert.match(runtimeSkill, /Leave `--effort` unset unless the user explicitly requests a specific effort/i); assert.match(runtimeSkill, /Leave model unset by default/i); assert.match(runtimeSkill, /Map `spark` to `--model gpt-5\.3-codex-spark`/i); - assert.match(runtimeSkill, /If the forwarded request includes `--background` or `--wait`, treat that as Claude-side execution control only/i); - assert.match(runtimeSkill, /Strip it before calling `task`/i); + assert.match(runtimeSkill, /If the forwarded request includes `--background`, pass that execution mode through by invoking `task --background`/i); + assert.match(runtimeSkill, /strip the flag token only from the natural-language prompt text/i); + assert.match(runtimeSkill, /If the forwarded request includes `--wait`, invoke foreground `task`/i); + assert.match(runtimeSkill, /do not pass `--wait` to `task`/i); + assert.match(runtimeSkill, /Do not drop an explicit execution mode while stripping tokens from prompt text/i); + assert.match(runtimeSkill, /Foreground task runs are capped by the Bash tool/i); + assert.match(runtimeSkill, /auto-backgrounded and orphaned/i); + assert.match(runtimeSkill, /long work belongs in `task --background`/i); assert.match(runtimeSkill, /`--effort`: accepted values are `none`, `minimal`, `low`, `medium`, `high`, `xhigh`/i); assert.match(runtimeSkill, /Do not inspect the repository, read files, grep, monitor progress, poll status, fetch results, cancel jobs, summarize output, or do any follow-up work of your own/i); assert.match(runtimeSkill, /If the Bash call fails or Codex cannot be invoked, return nothing/i); @@ -179,6 +205,10 @@ test("result and cancel commands are exposed as deterministic runtime entrypoint assert.match(cancel, /codex-companion\.mjs" cancel "\$ARGUMENTS"/); assert.match(resultHandling, /do not turn a failed or incomplete Codex run into a Claude-side implementation attempt/i); assert.match(resultHandling, /if Codex was never successfully invoked, do not generate a substitute answer at all/i); + assert.match(resultHandling, /\[\[codex-task status=complete\]\]/); + assert.match(resultHandling, /No automatic push notification will arrive/i); + assert.match(resultHandling, /status --wait --timeout-ms 1800000/); + assert.match(resultHandling, /run_in_background=true/); }); test("internal docs use task terminology for rescue runs", () => { @@ -200,6 +230,9 @@ test("hooks keep session-end cleanup and stop gating enabled", () => { const source = read("hooks/hooks.json"); assert.match(source, /SessionStart/); assert.match(source, /SessionEnd/); + assert.match(source, /PostToolUse/); + assert.match(source, /matcher":\s*"Agent"/); + assert.match(source, /codex-rescue-completion-hook\.mjs/); assert.match(source, /stop-review-gate-hook\.mjs/); assert.match(source, /session-lifecycle-hook\.mjs/); }); diff --git a/tests/fake-codex-fixture.mjs b/tests/fake-codex-fixture.mjs index debcadce..c1938c9b 100644 --- a/tests/fake-codex-fixture.mjs +++ b/tests/fake-codex-fixture.mjs @@ -380,6 +380,7 @@ rl.on("line", (line) => { turnId, model: message.params.model ?? null, effort: message.params.effort ?? null, + input: message.params.input ?? [], prompt }; saveState(state); diff --git a/tests/render.test.mjs b/tests/render.test.mjs index ab68038e..745c2fe7 100644 --- a/tests/render.test.mjs +++ b/tests/render.test.mjs @@ -1,7 +1,9 @@ import test from "node:test"; import assert from "node:assert/strict"; -import { renderReviewResult, renderStoredJobResult } from "../plugins/codex/scripts/lib/render.mjs"; +import { renderReviewResult, renderStoredJobResult, renderTaskResult } from "../plugins/codex/scripts/lib/render.mjs"; + +const TASK_COMPLETE_STATUS_TOKEN = "[[codex-task status=complete]]"; test("renderReviewResult degrades gracefully when JSON is missing required review fields", () => { const output = renderReviewResult( @@ -57,3 +59,27 @@ test("renderStoredJobResult prefers rendered output for structured review jobs", assert.match(output, /Codex session ID: thr_123/); assert.match(output, /Resume in Codex: codex resume thr_123/); }); + +test("renderTaskResult only stamps the complete sentinel for successful task runs", () => { + const successful = renderTaskResult( + { + rawOutput: "Handled the requested task.", + status: 0 + }, + {} + ); + + assert.equal(successful, `Handled the requested task.\n${TASK_COMPLETE_STATUS_TOKEN}\n`); + + const failedEmpty = renderTaskResult( + { + rawOutput: "", + failureMessage: "Codex failed before producing a final message.", + status: 1 + }, + {} + ); + + assert.equal(failedEmpty, "Codex failed before producing a final message.\n"); + assert.doesNotMatch(failedEmpty, /\[\[codex-task status=complete\]\]/); +}); diff --git a/tests/runtime.test.mjs b/tests/runtime.test.mjs index 90408372..9e3012ca 100644 --- a/tests/runtime.test.mjs +++ b/tests/runtime.test.mjs @@ -2,11 +2,12 @@ import fs from "node:fs"; import path from "node:path"; import test from "node:test"; import assert from "node:assert/strict"; -import { spawn } from "node:child_process"; -import { fileURLToPath } from "node:url"; +import { spawn, spawnSync } from "node:child_process"; +import { fileURLToPath, pathToFileURL } from "node:url"; import { buildEnv, installFakeCodex } from "./fake-codex-fixture.mjs"; import { initGitRepo, makeTempDir, run } from "./helpers.mjs"; +import { CodexAppServerClient } from "../plugins/codex/scripts/lib/app-server.mjs"; import { loadBrokerSession, saveBrokerSession } from "../plugins/codex/scripts/lib/broker-lifecycle.mjs"; import { resolveStateDir } from "../plugins/codex/scripts/lib/state.mjs"; @@ -16,6 +17,17 @@ const SCRIPT = path.join(PLUGIN_ROOT, "scripts", "codex-companion.mjs"); const STOP_HOOK = path.join(PLUGIN_ROOT, "scripts", "stop-review-gate-hook.mjs"); const SESSION_HOOK = path.join(PLUGIN_ROOT, "scripts", "session-lifecycle-hook.mjs"); +function writePreloadScript(dir, source) { + const preloadPath = path.join(dir, "preload.cjs"); + fs.writeFileSync(preloadPath, source, "utf8"); + return preloadPath; +} + +function extendNodeOptions(preloadPath) { + const previous = process.env.NODE_OPTIONS ? `${process.env.NODE_OPTIONS} ` : ""; + return `${previous}--require=${preloadPath}`; +} + async function waitFor(predicate, { timeoutMs = 5000, intervalMs = 50 } = {}) { const start = Date.now(); while (Date.now() - start < timeoutMs) { @@ -28,6 +40,51 @@ async function waitFor(predicate, { timeoutMs = 5000, intervalMs = 50 } = {}) { throw new Error("Timed out waiting for condition."); } +function seedRunningTaskJob(workspace, { id, pid }) { + // Status reads the summary state row first and the reconciler re-reads the + // stored job file before persisting a terminal state. + const stateDir = resolveStateDir(workspace); + const jobsDir = path.join(stateDir, "jobs"); + fs.mkdirSync(jobsDir, { recursive: true }); + + const logFile = path.join(jobsDir, `${id}.log`); + const job = { + id, + status: "running", + phase: "running", + title: "Codex Task", + jobClass: "task", + summary: "Investigate flaky test", + pid, + logFile, + createdAt: "2026-03-18T15:30:00.000Z", + startedAt: "2026-03-18T15:30:01.000Z", + updatedAt: "2026-03-18T15:30:02.000Z" + }; + + fs.writeFileSync(logFile, "[2026-03-18T15:30:00.000Z] Starting Codex Task.\n", "utf8"); + fs.writeFileSync(path.join(jobsDir, `${id}.json`), `${JSON.stringify(job, null, 2)}\n`, "utf8"); + fs.writeFileSync( + path.join(stateDir, "state.json"), + `${JSON.stringify({ version: 1, config: { stopReviewGate: false }, jobs: [job] }, null, 2)}\n`, + "utf8" + ); + + return { stateDir, jobsDir, job, logFile }; +} + +function isPidDead(pid) { + try { + process.kill(pid, 0); + return false; + } catch (error) { + if (error?.code === "ESRCH") { + return true; + } + throw error; + } +} + test("setup reports ready when fake codex is installed and authenticated", () => { const binDir = makeTempDir(); installFakeCodex(binDir); @@ -366,7 +423,7 @@ test("task --resume-last resumes the latest persisted task thread", () => { }); assert.equal(result.status, 0, result.stderr); - assert.equal(result.stdout, "Resumed the prior run.\nFollow-up prompt accepted.\n"); + assert.equal(result.stdout, "Resumed the prior run.\nFollow-up prompt accepted.\n[[codex-task status=complete]]\n"); }); test("task-resume-candidate returns the latest rescue thread from the current session", () => { @@ -577,7 +634,190 @@ test("write task output focuses on the Codex result without generic follow-up hi }); assert.equal(result.status, 0, result.stderr); - assert.equal(result.stdout, "Handled the requested task.\nTask prompt accepted.\n"); + assert.equal(result.stdout, "Handled the requested task.\nTask prompt accepted.\n[[codex-task status=complete]]\n"); +}); + +test("foreground task enqueues a detached worker and prints the stored result", () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + installFakeCodex(binDir); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const result = run("node", [SCRIPT, "task", "--write", "fix the failing test"], { + cwd: repo, + env: buildEnv(binDir) + }); + + assert.equal(result.status, 0, result.stderr); + assert.equal(result.stdout, "Handled the requested task.\nTask prompt accepted.\n[[codex-task status=complete]]\n"); + + const stateDir = resolveStateDir(repo); + const state = JSON.parse(fs.readFileSync(path.join(stateDir, "state.json"), "utf8")); + const job = state.jobs.find((candidate) => candidate.jobClass === "task"); + assert.ok(job, "expected foreground task to create a tracked task job"); + assert.equal(job.status, "completed"); + + const storedJob = JSON.parse(fs.readFileSync(path.join(stateDir, "jobs", `${job.id}.json`), "utf8")); + const log = fs.readFileSync(storedJob.logFile, "utf8"); + assert.equal(storedJob.request.jobId, job.id); + assert.equal(storedJob.request.prompt, "fix the failing test"); + assert.match(log, /Queued for background execution\./); + assert.equal(result.stdout, storedJob.rendered); +}); + +test("foreground task worker wait is not bounded by the default status timeout", () => { + const source = fs.readFileSync(SCRIPT, "utf8"); + const match = source.match(/async function runForegroundTaskWorker[\s\S]*?\n}\n\nasync function handleTask/); + + assert.ok(match, "expected to locate runForegroundTaskWorker"); + assert.match(match[0], /timeoutMs:\s*Infinity/); + assert.match(match[0], /pollIntervalMs:\s*FOREGROUND_TASK_POLL_INTERVAL_MS/); +}); + +test("foreground task reports an error when the detached worker pid is already dead", () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + installFakeCodex(binDir); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const preloadPath = writePreloadScript( + binDir, + ` +const childProcess = require("node:child_process"); +const { syncBuiltinESMExports } = require("node:module"); +const originalSpawn = childProcess.spawn; +childProcess.spawn = function patchedSpawn(command, args, options) { + if (Array.isArray(args) && args[1] === "task-worker") { + return { pid: 987654321, unref() {} }; + } + return originalSpawn.apply(this, arguments); +}; +syncBuiltinESMExports(); +` + ); + + const result = spawnSync(process.execPath, [SCRIPT, "task", "--write", "fix the failing test"], { + cwd: repo, + env: { + ...buildEnv(binDir), + NODE_OPTIONS: extendNodeOptions(preloadPath) + }, + encoding: "utf8", + timeout: 3000, + windowsHide: true + }); + + assert.notEqual(result.error?.code, "ETIMEDOUT", "foreground task hung waiting for a dead worker pid"); + assert.notEqual(result.status, 0); + assert.match(result.stderr, /background worker exited before completing; check \/codex:status/); + + const stateDir = resolveStateDir(repo); + const state = JSON.parse(fs.readFileSync(path.join(stateDir, "state.json"), "utf8")); + const job = state.jobs.find((candidate) => candidate.jobClass === "task"); + assert.equal(job.status, "failed"); + assert.equal(job.pid, null); + assert.match(job.errorMessage, /background worker exited before completing/); +}); + +test("write task run prepends the workspace-boundary notice to the turn input", () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + const statePath = path.join(binDir, "fake-codex-state.json"); + installFakeCodex(binDir); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const result = run("node", [SCRIPT, "task", "--write", "fix the failing test"], { + cwd: repo, + env: buildEnv(binDir) + }); + + assert.equal(result.status, 0, result.stderr); + const fakeState = JSON.parse(fs.readFileSync(statePath, "utf8")); + assert.equal(fakeState.lastTurnStart.input.length, 2); + assert.equal(fakeState.lastTurnStart.input[0].type, "text"); + assert.match(fakeState.lastTurnStart.input[0].text, /write access ONLY within/); + assert.match(fakeState.lastTurnStart.input[0].text, /Do NOT probe/); + assert.match(fakeState.lastTurnStart.input[0].text, /--cwd/); + assert.match(fakeState.lastTurnStart.input[0].text, new RegExp(repo.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"))); + assert.deepEqual(fakeState.lastTurnStart.input[1], { + type: "text", + text: "fix the failing test", + text_elements: [] + }); +}); + +test("the workspace-boundary notice is ephemeral for write task runs", async () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + const statePath = path.join(binDir, "fake-codex-state.json"); + installFakeCodex(binDir, "slow-task"); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const launched = run("node", [SCRIPT, "task", "--background", "--write", "--json", "implement scoped update"], { + cwd: repo, + env: buildEnv(binDir) + }); + + assert.equal(launched.status, 0, launched.stderr); + const launchPayload = JSON.parse(launched.stdout); + const jobFile = path.join(resolveStateDir(repo), "jobs", `${launchPayload.jobId}.json`); + const queuedJob = JSON.parse(fs.readFileSync(jobFile, "utf8")); + assert.equal(queuedJob.request.prompt, "implement scoped update"); + assert.doesNotMatch(queuedJob.request.prompt, /WORKSPACE:/); + assert.doesNotMatch(queuedJob.request.prompt, /write access ONLY within/); + + await waitFor(() => { + const storedJob = JSON.parse(fs.readFileSync(jobFile, "utf8")); + return storedJob.status === "completed" ? storedJob : null; + }, { timeoutMs: 15000 }); + + const completedJob = JSON.parse(fs.readFileSync(jobFile, "utf8")); + assert.equal(completedJob.request.prompt, "implement scoped update"); + assert.doesNotMatch(completedJob.request.prompt, /WORKSPACE:/); + + const fakeState = JSON.parse(fs.readFileSync(statePath, "utf8")); + const taskThread = fakeState.threads.find((thread) => thread.id === fakeState.lastTurnStart.threadId); + assert.match(taskThread.name, /^Codex Companion Task: implement scoped update/); + assert.doesNotMatch(taskThread.name, /WORKSPACE:/); + assert.doesNotMatch(taskThread.name, /write access ONLY within/); +}); + +test("read-only task run does not get the workspace-boundary notice", () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + const statePath = path.join(binDir, "fake-codex-state.json"); + installFakeCodex(binDir); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const result = run("node", [SCRIPT, "task", "diagnose the failing test"], { + cwd: repo, + env: buildEnv(binDir) + }); + + assert.equal(result.status, 0, result.stderr); + const fakeState = JSON.parse(fs.readFileSync(statePath, "utf8")); + assert.deepEqual(fakeState.lastTurnStart.input, [ + { + type: "text", + text: "diagnose the failing test", + text_elements: [] + } + ]); }); test("task --resume acts like --resume-last without leaking the flag into the prompt", () => { @@ -715,7 +955,7 @@ test("task waits for the main thread to complete before returning the final resu }); assert.equal(result.status, 0, result.stderr); - assert.equal(result.stdout, "Handled the requested task.\nTask prompt accepted.\n"); + assert.equal(result.stdout, "Handled the requested task.\nTask prompt accepted.\n[[codex-task status=complete]]\n"); }); test("task ignores later subagent messages when choosing the final returned output", () => { @@ -733,7 +973,7 @@ test("task ignores later subagent messages when choosing the final returned outp }); assert.equal(result.status, 0, result.stderr); - assert.equal(result.stdout, "Handled the requested task.\nTask prompt accepted.\n"); + assert.equal(result.stdout, "Handled the requested task.\nTask prompt accepted.\n[[codex-task status=complete]]\n"); }); test("task can finish after subagent work even if the parent turn/completed event is missing", () => { @@ -751,7 +991,7 @@ test("task can finish after subagent work even if the parent turn/completed even }); assert.equal(result.status, 0, result.stderr); - assert.equal(result.stdout, "Handled the requested task.\nTask prompt accepted.\n"); + assert.equal(result.stdout, "Handled the requested task.\nTask prompt accepted.\n[[codex-task status=complete]]\n"); }); test("task using the shared broker still completes when Codex spawns subagents", () => { @@ -781,7 +1021,7 @@ test("task using the shared broker still completes when Codex spawns subagents", }); assert.equal(result.status, 0, result.stderr); - assert.equal(result.stdout, "Handled the requested task.\nTask prompt accepted.\n"); + assert.equal(result.stdout, "Handled the requested task.\nTask prompt accepted.\n[[codex-task status=complete]]\n"); }); test("task --background enqueues a detached worker and exposes per-job status", async () => { @@ -803,6 +1043,17 @@ test("task --background enqueues a detached worker and exposes per-job status", assert.equal(launchPayload.status, "queued"); assert.match(launchPayload.jobId, /^task-/); + const stateDir = resolveStateDir(repo); + const launchedState = JSON.parse(fs.readFileSync(path.join(stateDir, "state.json"), "utf8")); + const launchedJob = launchedState.jobs.find((candidate) => candidate.id === launchPayload.jobId); + assert.ok(Number.isInteger(launchedJob?.pid) && launchedJob.pid > 0, "expected launched job to record worker pid"); + + const launchedStoredJob = JSON.parse(fs.readFileSync(path.join(stateDir, "jobs", `${launchPayload.jobId}.json`), "utf8")); + assert.ok( + Number.isInteger(launchedStoredJob.pid) && launchedStoredJob.pid > 0, + "expected stored launched job to record worker pid" + ); + const waitedStatus = run( "node", [SCRIPT, "status", launchPayload.jobId, "--wait", "--timeout-ms", "15000", "--json"], @@ -833,6 +1084,532 @@ test("task --background enqueues a detached worker and exposes per-job status", assert.match(resultPayload.storedJob.rendered, /Handled the requested task/); }); +test("task --background skips spawning when the queued job was cancelled before worker launch", () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + const spawnMarker = path.join(binDir, "task-worker-spawned.json"); + installFakeCodex(binDir, "slow-task"); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const preloadPath = writePreloadScript( + binDir, + ` +const fs = require("node:fs"); +const path = require("node:path"); +const childProcess = require("node:child_process"); +const { syncBuiltinESMExports } = require("node:module"); +const spawnMarker = ${JSON.stringify(spawnMarker)}; +const originalWriteFileSync = fs.writeFileSync; +const originalSpawn = childProcess.spawn; +let rewriting = false; + +function cancelQueuedTask(job) { + return { + ...job, + status: "cancelled", + phase: "cancelled", + pid: null, + errorMessage: "Cancelled before worker launch.", + cancelledAt: new Date().toISOString() + }; +} + +function rewriteJson(filePath, payload) { + rewriting = true; + try { + originalWriteFileSync(filePath, JSON.stringify(payload, null, 2) + "\\n", "utf8"); + } finally { + rewriting = false; + } +} + +fs.writeFileSync = function patchedWriteFileSync(file, data, options) { + originalWriteFileSync.apply(this, arguments); + if (rewriting) { + return; + } + + const filePath = String(file); + if (!filePath.endsWith(".json")) { + return; + } + + let parsed; + try { + parsed = JSON.parse(String(data)); + } catch { + return; + } + + if (filePath.includes(path.sep + "jobs" + path.sep) && parsed?.jobClass === "task" && parsed.status === "queued") { + rewriteJson(filePath, cancelQueuedTask(parsed)); + return; + } + + if (filePath.endsWith(path.sep + "state.json") && Array.isArray(parsed.jobs)) { + let changed = false; + const jobs = parsed.jobs.map((job) => { + if (job?.jobClass !== "task" || job.status !== "queued") { + return job; + } + changed = true; + return cancelQueuedTask(job); + }); + if (changed) { + rewriteJson(filePath, { ...parsed, jobs }); + } + } +}; + +childProcess.spawn = function patchedSpawn(command, args, options) { + if (Array.isArray(args) && args[1] === "task-worker") { + originalWriteFileSync(spawnMarker, JSON.stringify({ command, args }, null, 2) + "\\n", "utf8"); + return { pid: 424242, unref() {} }; + } + return originalSpawn.apply(this, arguments); +}; + +syncBuiltinESMExports(); +` + ); + + const launched = run("node", [SCRIPT, "task", "--background", "--json", "investigate the failing test"], { + cwd: repo, + env: { + ...buildEnv(binDir), + NODE_OPTIONS: extendNodeOptions(preloadPath) + } + }); + + // PR #346 review: a cancelled launch dispatched no live job, so exit is non-zero. + assert.notEqual(launched.status, 0, launched.stderr); + const launchPayload = JSON.parse(launched.stdout); + assert.equal(launchPayload.status, "cancelled"); + assert.equal(fs.existsSync(spawnMarker), false, "expected cancellation guard to skip task-worker spawn"); + + const stateDir = resolveStateDir(repo); + const job = JSON.parse(fs.readFileSync(path.join(stateDir, "state.json"), "utf8")).jobs.find( + (candidate) => candidate.id === launchPayload.jobId + ); + assert.equal(job.status, "cancelled"); + assert.equal(job.pid, null); + + const storedJob = JSON.parse(fs.readFileSync(path.join(stateDir, "jobs", `${launchPayload.jobId}.json`), "utf8")); + assert.equal(storedJob.status, "cancelled"); + assert.equal(storedJob.pid, null); +}); + +test("task --background does not persist a worker pid when cancellation wins after spawn", () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + const killMarker = path.join(binDir, "task-worker-killed.json"); + installFakeCodex(binDir, "slow-task"); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const preloadPath = writePreloadScript( + binDir, + ` +const fs = require("node:fs"); +const path = require("node:path"); +const childProcess = require("node:child_process"); +const processModule = require("node:process"); +const { syncBuiltinESMExports } = require("node:module"); +const killMarker = ${JSON.stringify(killMarker)}; +const originalReadFileSync = fs.readFileSync; +const originalWriteFileSync = fs.writeFileSync; +const originalSpawn = childProcess.spawn; +const originalKill = processModule.kill; +let spawnedTaskWorker = false; +let injectedCancellation = false; +let rewriting = false; + +function encodeLikeOriginal(value, options) { + return options === "utf8" || options?.encoding === "utf8" ? value : Buffer.from(value); +} + +function cancelTaskJob(job) { + return { + ...job, + status: "cancelled", + phase: "cancelled", + pid: null, + errorMessage: "Cancelled after worker spawn.", + cancelledAt: new Date().toISOString() + }; +} + +function writeJson(filePath, payload) { + rewriting = true; + try { + originalWriteFileSync(filePath, JSON.stringify(payload, null, 2) + "\\n", "utf8"); + } finally { + rewriting = false; + } +} + +fs.readFileSync = function patchedReadFileSync(file, options) { + const filePath = String(file); + const data = originalReadFileSync.apply(this, arguments); + if (!spawnedTaskWorker || injectedCancellation || rewriting || !filePath.endsWith(path.sep + "state.json")) { + return data; + } + + let parsed; + try { + parsed = JSON.parse(String(data)); + } catch { + return data; + } + + if (!Array.isArray(parsed.jobs)) { + return data; + } + + let cancelledJob = null; + const jobs = parsed.jobs.map((job) => { + if (!cancelledJob && job?.jobClass === "task" && job.status === "queued") { + cancelledJob = cancelTaskJob(job); + return cancelledJob; + } + return job; + }); + if (!cancelledJob) { + return data; + } + + injectedCancellation = true; + const cancelledState = { ...parsed, jobs }; + writeJson(filePath, cancelledState); + writeJson(path.join(path.dirname(filePath), "jobs", cancelledJob.id + ".json"), cancelledJob); + return encodeLikeOriginal(JSON.stringify(cancelledState, null, 2) + "\\n", options); +}; + +childProcess.spawn = function patchedSpawn(command, args, options) { + if (Array.isArray(args) && args[1] === "task-worker") { + spawnedTaskWorker = true; + return { + pid: 424242, + unref() {}, + once() { + return this; + }, + on() { + return this; + } + }; + } + return originalSpawn.apply(this, arguments); +}; + +processModule.kill = function patchedKill(pid, signal) { + if (pid === -424242 || pid === 424242) { + originalWriteFileSync(killMarker, JSON.stringify({ pid, signal }, null, 2) + "\\n", "utf8"); + return true; + } + return originalKill.apply(this, arguments); +}; + +syncBuiltinESMExports(); +` + ); + + const launched = run("node", [SCRIPT, "task", "--background", "--json", "investigate the failing test"], { + cwd: repo, + env: { + ...buildEnv(binDir), + NODE_OPTIONS: extendNodeOptions(preloadPath) + } + }); + + // PR #346 review: a cancelled launch dispatched no live job, so exit is non-zero. + assert.notEqual(launched.status, 0, launched.stderr); + const launchPayload = JSON.parse(launched.stdout); + assert.equal(launchPayload.status, "cancelled"); + assert.equal(fs.existsSync(killMarker), true, "expected the orphaned task-worker process group to be killed"); + + const stateDir = resolveStateDir(repo); + const job = JSON.parse(fs.readFileSync(path.join(stateDir, "state.json"), "utf8")).jobs.find( + (candidate) => candidate.id === launchPayload.jobId + ); + assert.equal(job.status, "cancelled"); + assert.equal(job.pid, null); + + const storedJob = JSON.parse(fs.readFileSync(path.join(stateDir, "jobs", `${launchPayload.jobId}.json`), "utf8")); + assert.equal(storedJob.status, "cancelled"); + assert.equal(storedJob.pid, null); +}); + +test("foreground task reports a structured failure when the detached worker spawn fails", () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + installFakeCodex(binDir); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const preloadPath = writePreloadScript( + binDir, + ` +const childProcess = require("node:child_process"); +const { EventEmitter } = require("node:events"); +const { syncBuiltinESMExports } = require("node:module"); +const originalSpawn = childProcess.spawn; + +childProcess.spawn = function patchedSpawn(command, args, options) { + if (Array.isArray(args) && args[1] === "task-worker") { + const child = new EventEmitter(); + child.pid = undefined; + child.unref = function unref() {}; + process.nextTick(() => { + const error = new Error("spawn ENOENT task-worker"); + error.code = "ENOENT"; + child.emit("error", error); + }); + return child; + } + return originalSpawn.apply(this, arguments); +}; + +syncBuiltinESMExports(); +` + ); + + const result = spawnSync(process.execPath, [SCRIPT, "task", "--write", "--json", "fix the failing test"], { + cwd: repo, + env: { + ...buildEnv(binDir), + NODE_OPTIONS: extendNodeOptions(preloadPath) + }, + encoding: "utf8", + timeout: 3000, + windowsHide: true + }); + + assert.notEqual(result.error?.code, "ETIMEDOUT", "foreground task hung waiting for a worker that never launched"); + assert.notEqual(result.status, 0); + assert.doesNotMatch(result.stderr, /Unhandled 'error' event/); + + const payload = JSON.parse(result.stdout); + assert.equal(payload.job.status, "failed"); + assert.equal(payload.job.pid, null); + assert.match(payload.storedJob.errorMessage, /Failed to launch background worker|missing worker pid|spawn ENOENT/); +}); + +test("task --background exits non-zero when the worker spawn fails", () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + installFakeCodex(binDir); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const preloadPath = writePreloadScript( + binDir, + ` +const childProcess = require("node:child_process"); +const { EventEmitter } = require("node:events"); +const { syncBuiltinESMExports } = require("node:module"); +const originalSpawn = childProcess.spawn; + +childProcess.spawn = function patchedSpawn(command, args, options) { + if (Array.isArray(args) && args[1] === "task-worker") { + const child = new EventEmitter(); + child.pid = undefined; + child.unref = function unref() {}; + process.nextTick(() => { + const error = new Error("spawn ENOENT task-worker"); + error.code = "ENOENT"; + child.emit("error", error); + }); + return child; + } + return originalSpawn.apply(this, arguments); +}; + +syncBuiltinESMExports(); +` + ); + + const result = spawnSync(process.execPath, [SCRIPT, "task", "--background", "--json", "fix the failing test"], { + cwd: repo, + env: { + ...buildEnv(binDir), + NODE_OPTIONS: extendNodeOptions(preloadPath) + }, + encoding: "utf8", + timeout: 3000, + windowsHide: true + }); + + assert.notEqual(result.error?.code, "ETIMEDOUT", "background task hung after a worker launch failure"); + assert.notEqual(result.status, 0); + assert.doesNotMatch(result.stderr, /Unhandled 'error' event/); + + const payload = JSON.parse(result.stdout); + assert.equal(payload.status, "failed"); +}); + +test("task --background cancelled before launch renders terminal text without a dispatched token", () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + installFakeCodex(binDir, "slow-task"); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const preloadPath = writePreloadScript( + binDir, + ` +const fs = require("node:fs"); +const path = require("node:path"); +const childProcess = require("node:child_process"); +const { syncBuiltinESMExports } = require("node:module"); +const originalWriteFileSync = fs.writeFileSync; +const originalSpawn = childProcess.spawn; +let rewriting = false; + +function cancelQueuedTask(job) { + return { + ...job, + status: "cancelled", + phase: "cancelled", + pid: null, + errorMessage: "Cancelled before worker launch.", + cancelledAt: new Date().toISOString() + }; +} + +function rewriteJson(filePath, payload) { + rewriting = true; + try { + originalWriteFileSync(filePath, JSON.stringify(payload, null, 2) + "\\n", "utf8"); + } finally { + rewriting = false; + } +} + +fs.writeFileSync = function patchedWriteFileSync(file, data, options) { + originalWriteFileSync.apply(this, arguments); + if (rewriting) { + return; + } + + const filePath = String(file); + if (!filePath.endsWith(".json")) { + return; + } + + let parsed; + try { + parsed = JSON.parse(String(data)); + } catch { + return; + } + + if (filePath.includes(path.sep + "jobs" + path.sep) && parsed?.jobClass === "task" && parsed.status === "queued") { + rewriteJson(filePath, cancelQueuedTask(parsed)); + return; + } + + if (filePath.endsWith(path.sep + "state.json") && Array.isArray(parsed.jobs)) { + let changed = false; + const jobs = parsed.jobs.map((job) => { + if (job?.jobClass !== "task" || job.status !== "queued") { + return job; + } + changed = true; + return cancelQueuedTask(job); + }); + if (changed) { + rewriteJson(filePath, { ...parsed, jobs }); + } + } +}; + +childProcess.spawn = function patchedSpawn(command, args, options) { + if (Array.isArray(args) && args[1] === "task-worker") { + throw new Error("task-worker should not launch after cancellation"); + } + return originalSpawn.apply(this, arguments); +}; + +syncBuiltinESMExports(); +` + ); + + const launched = run("node", [SCRIPT, "task", "--background", "investigate the failing test"], { + cwd: repo, + env: { + ...buildEnv(binDir), + NODE_OPTIONS: extendNodeOptions(preloadPath) + } + }); + + // PR #346 review: a cancelled launch dispatched no live job, so exit is non-zero. + assert.notEqual(launched.status, 0, launched.stderr); + assert.doesNotMatch(launched.stdout, /\[\[codex-task status=dispatched/); + assert.doesNotMatch(launched.stdout, /dispatched as background job/); + assert.match(launched.stdout, /was cancelled before a worker launched; no work started\./); +}); + +test("task --background emits a dispatched token without promising notification", () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + installFakeCodex(binDir, "slow-task"); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const launched = run("node", [SCRIPT, "task", "--background", "investigate the failing test"], { + cwd: repo, + env: buildEnv(binDir) + }); + + assert.equal(launched.status, 0, launched.stderr); + assert.match(launched.stdout, /^\[\[codex-task status=dispatched id=task-[^\]\s]+\]\]\n/); + const jobId = launched.stdout.match(/id=(task-[^\]\s]+)/)?.[1]; + assert.ok(jobId, "expected dispatched token to include a task job id"); + assert.match(launched.stdout, new RegExp(`No automatic notification will arrive; poll /codex:status ${jobId}\\.`)); + assert.match( + launched.stdout, + new RegExp( + `To be notified on completion, run with the Bash tool \\(run_in_background\\): node "\\$\\{CLAUDE_PLUGIN_ROOT\\}/scripts/codex-companion\\.mjs" status ${jobId} --wait --timeout-ms 1800000 \\(re-arm it if it returns still-running\\)\\.` + ) + ); + assert.doesNotMatch(launched.stdout, /will notify|will be notified|you will be notified/i); +}); + +test("task rejects conflicting background and wait flags before dispatching a job", () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + installFakeCodex(binDir, "slow-task"); + initGitRepo(repo); + fs.writeFileSync(path.join(repo, "README.md"), "hello\n"); + run("git", ["add", "README.md"], { cwd: repo }); + run("git", ["commit", "-m", "init"], { cwd: repo }); + + const result = run("node", [SCRIPT, "task", "--background", "--wait", "investigate the failing test"], { + cwd: repo, + env: buildEnv(binDir) + }); + + const stateFile = path.join(resolveStateDir(repo), "state.json"); + const stateJobs = fs.existsSync(stateFile) ? JSON.parse(fs.readFileSync(stateFile, "utf8")).jobs : []; + assert.deepEqual(stateJobs, []); + assert.notEqual(result.status, 0); + assert.match(result.stderr, /Choose either --background or --wait, not both\./); +}); + test("review rejects focus text because it is native-review only", () => { const repo = makeTempDir(); const binDir = makeTempDir(); @@ -1216,6 +1993,88 @@ test("status --wait times out cleanly when a job is still active", () => { assert.equal(payload.waitTimedOut, true); }); +test("status --wait reports a crashed worker instead of hanging when the worker pid is dead", () => { + const workspace = makeTempDir(); + const { stateDir } = seedRunningTaskJob(workspace, { id: "task-dead-wait", pid: 987654321 }); + + const result = spawnSync( + process.execPath, + [SCRIPT, "status", "task-dead-wait", "--wait", "--timeout-ms", "600000", "--json"], + { + cwd: workspace, + encoding: "utf8", + timeout: 3000, + windowsHide: true + } + ); + + assert.notEqual(result.error?.code, "ETIMEDOUT", "status --wait hung waiting for a dead worker pid"); + assert.equal(result.status, 0, result.stderr); + const payload = JSON.parse(result.stdout); + assert.equal(payload.job.id, "task-dead-wait"); + assert.equal(payload.job.status, "failed"); + assert.match(payload.job.errorMessage, /background worker exited before completing/); + // The failure message must carry the verify-on-disk warning: edits can still + // land via the shared app-server after the worker pid dies, so a single + // snapshot is not trustworthy. + assert.match(payload.job.errorMessage, /still be landing via the shared app-server|re-check disk state/i); + + const state = JSON.parse(fs.readFileSync(path.join(stateDir, "state.json"), "utf8")); + const job = state.jobs.find((candidate) => candidate.id === "task-dead-wait"); + assert.equal(job.status, "failed"); + assert.equal(job.pid, null); + assert.match(job.errorMessage, /background worker exited before completing/); +}); + +test("status (no --wait) reconciles a dead-worker job to failed", () => { + const workspace = makeTempDir(); + const { stateDir } = seedRunningTaskJob(workspace, { id: "task-dead-nowait", pid: 987654321 }); + + const result = run("node", [SCRIPT, "status", "task-dead-nowait", "--json"], { + cwd: workspace + }); + + assert.equal(result.status, 0, result.stderr); + const payload = JSON.parse(result.stdout); + assert.equal(payload.job.id, "task-dead-nowait"); + assert.equal(payload.job.status, "failed"); + assert.match(payload.job.errorMessage, /background worker exited before completing/); + + const state = JSON.parse(fs.readFileSync(path.join(stateDir, "state.json"), "utf8")); + const job = state.jobs.find((candidate) => candidate.id === "task-dead-nowait"); + assert.equal(job.status, "failed"); + assert.equal(job.pid, null); + assert.match(job.errorMessage, /background worker exited before completing/); +}); + +test("status --wait keeps waiting when the active worker pid is alive", () => { + const workspace = makeTempDir(); + const { stateDir } = seedRunningTaskJob(workspace, { id: "task-live-pid", pid: process.pid }); + + const result = spawnSync( + process.execPath, + [SCRIPT, "status", "task-live-pid", "--wait", "--timeout-ms", "60", "--json"], + { + cwd: workspace, + encoding: "utf8", + timeout: 3000, + windowsHide: true + } + ); + + assert.notEqual(result.error?.code, "ETIMEDOUT", "status --wait did not honor the short timeout"); + assert.equal(result.status, 0, result.stderr); + const payload = JSON.parse(result.stdout); + assert.equal(payload.job.id, "task-live-pid"); + assert.equal(payload.job.status, "running"); + assert.equal(payload.waitTimedOut, true); + + const state = JSON.parse(fs.readFileSync(path.join(stateDir, "state.json"), "utf8")); + const job = state.jobs.find((candidate) => candidate.id === "task-live-pid"); + assert.equal(job.status, "running"); + assert.equal(job.pid, process.pid); +}); + test("result returns the stored output for the latest finished job by default", () => { const workspace = makeTempDir(); const stateDir = resolveStateDir(workspace); @@ -1980,6 +2839,120 @@ test("stop hook runs the actual task when auth status looks stale", () => { assert.match(payload.reason, /Missing empty-state guard/i); }); +test("locally-spawned codex app-server is reaped when the host process exits without close()", async (t) => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + const hostScript = path.join(repo, "host-exits-with-direct-app-server.mjs"); + const appServerModule = pathToFileURL(path.join(PLUGIN_ROOT, "scripts", "lib", "app-server.mjs")).href; + let appServerPid = null; + + installFakeCodex(binDir, "interruptible-slow-task"); + + fs.writeFileSync( + hostScript, + `import { CodexAppServerClient } from ${JSON.stringify(appServerModule)}; + +const cwd = ${JSON.stringify(repo)}; +const client = await CodexAppServerClient.connect(cwd, { disableBroker: true, env: process.env }); +const thread = await client.request("thread/start", { cwd, ephemeral: true }); +await client.request("turn/start", { + threadId: thread.thread.id, + input: [{ type: "text", text: "keep the fake app-server alive until the host exits" }] +}); + +process.stdout.write(String(client.proc.pid) + "\\n", () => { + process.exit(0); +}); +`, + "utf8" + ); + + t.after(() => { + if (appServerPid !== null) { + try { + process.kill(appServerPid, "SIGKILL"); + } catch { + // Ignore a process that was already reaped by the exit handler under test. + } + } + }); + + const host = spawn(process.execPath, [hostScript], { + cwd: repo, + env: buildEnv(binDir), + stdio: ["ignore", "pipe", "pipe"] + }); + let stdout = ""; + let stderr = ""; + host.stdout.setEncoding("utf8"); + host.stderr.setEncoding("utf8"); + host.stdout.on("data", (chunk) => { + stdout += chunk; + }); + host.stderr.on("data", (chunk) => { + stderr += chunk; + }); + + const hostResult = await new Promise((resolve, reject) => { + const timeout = setTimeout(() => { + host.kill("SIGKILL"); + reject(new Error("Timed out waiting for host process to exit.")); + }, 5000); + host.on("error", (error) => { + clearTimeout(timeout); + reject(error); + }); + host.on("close", (code, signal) => { + clearTimeout(timeout); + resolve({ code, signal }); + }); + }); + + assert.equal(hostResult.code, 0, stderr); + assert.equal(hostResult.signal, null, stderr); + appServerPid = Number.parseInt(stdout.trim(), 10); + assert.equal(Number.isInteger(appServerPid), true, `invalid app-server pid output: ${stdout}`); + + await waitFor(() => isPidDead(appServerPid), { timeoutMs: 4000, intervalMs: 50 }); +}); + +test("close() removes the exit reaper for locally-spawned codex app-server clients", async () => { + const repo = makeTempDir(); + const binDir = makeTempDir(); + const beforeExitListeners = process.listenerCount("exit"); + let client = null; + let appServerPid = null; + + installFakeCodex(binDir); + + try { + client = await CodexAppServerClient.connect(repo, { + disableBroker: true, + env: buildEnv(binDir) + }); + appServerPid = client.proc.pid; + + assert.equal(client.transport, "direct"); + assert.equal(process.listenerCount("exit"), beforeExitListeners + 1); + + await client.close(); + + await waitFor(() => isPidDead(appServerPid), { timeoutMs: 4000, intervalMs: 50 }); + assert.equal(process.listenerCount("exit"), beforeExitListeners); + } finally { + if (client) { + await client.close().catch(() => {}); + } + if (appServerPid !== null) { + try { + process.kill(appServerPid, "SIGKILL"); + } catch { + // Ignore a process that close() or the child exit event already reaped. + } + } + } +}); + test("commands lazily start and reuse one shared app-server after first use", async () => { const repo = makeTempDir(); const binDir = makeTempDir();