Skip to content

feat(engine): Retry affordance for a stuck autopilot goal#1148

Open
psdjungpulzze wants to merge 2 commits into
mainfrom
feat/goal-retry-stuck-autopilot
Open

feat(engine): Retry affordance for a stuck autopilot goal#1148
psdjungpulzze wants to merge 2 commits into
mainfrom
feat/goal-retry-stuck-autopilot

Conversation

@psdjungpulzze

Copy link
Copy Markdown
Contributor

Summary

Adds a goal-level Retry affordance to recover a stuck autopilot goal — one whose run failed and stalled on an open run_failure pause. Previously the only recovery was drilling into each failed task's /attention card and resuming its pause individually. This is the counterpart to the existing Resume goal (backlog → queued).

What "stuck" means

An autopilot goal is stuck when ≥1 of its tasks has an open run_failure pause (a run failed/abandoned and is waiting on a human). The planning task that scopes the goal, or any child task, can land here — leaving the goal stalled (and previously still showing the "scoping" spinner indefinitely).

Changes

  • retryGoalFailures(goalId, projectId, resolvedBy?) (src/lib/engine/pause.ts) — resumes every open run_failure pause under the goal via the existing resolvePause (so it inherits per-kind transition, audit, engine:pause-resolved publish, and transient-retry-budget reset), re-queuing each failed task. Scoped to run_failure only — clarification/risk/standoff pauses are deliberate human gates, not failures, so they stay on /attention for an explicit decision. Idempotent + best-effort (a pause another actor resolved in between is skipped); fires one coarse engine.changed.
  • POST /api/v2/projects/:projectId/goals/:goalId/retry — mirrors the resume route (auth, UUID-guard, no-leak 404), passing the resolving user as resolvedBy.
  • GoalDetailData.stuck.failedTasks — count of open run_failure pauses under the goal, computed in the one shared serializer (buildGoalDetailData) so the server-rendered page and the GET goal-read route agree byte-for-byte.
  • Detail UI (engine-goal-detail-client.tsx) — an Autopilot: stuck badge + a Retry N failed task(s) button, shown only when stuck; the "scoping" spinner is suppressed while stuck (a failed run isn't scoping). New STUCK_BADGE in the shared goal-badges vocabulary.

AI agent maintenance

No agent-config update needed: goal resume/retry are operator UI actions, not agent-exposed tools (no TOOL_NAMES entry references them).

Tests

  • pause.test.tsretryGoalFailures: resumes all open run_failure pauses + re-queues each task + fires one engine.changed; no-op (no publish) when nothing is stuck; skips an already-resolved pause (idempotent, not counted).
  • goal-detail.test.tsstuck.failedTasks counts open run_failure pauses scoped to the goal; zero when none.
  • engine-goal-detail-client.test.tsx — badge + button visibility/pluralization, scoping suppression while stuck, POST-to-retry-route + refetch, error path.
  • retry/route.test.ts — actor pass-through, no-leak 404s, auth pass-through, scoped lookup.

Full src/lib/engine suite (685 tests), the four touched suites, tsc --noEmit, and eslint on changed files all pass.

🤖 Generated with Claude Code

psdjungpulzze and others added 2 commits June 17, 2026 00:32
When an autopilot goal's run fails, the failed task is parked on an open
`run_failure` pause and the goal stalls — previously the only recovery was
drilling into each task's /attention card to resume its pause. This adds a
goal-level Retry affordance, the counterpart to the existing "Resume goal"
(backlog → queued).

- `retryGoalFailures(goalId, projectId, resolvedBy?)` resumes every open
  `run_failure` pause under the goal via the existing `resolvePause` (per-kind
  transition + audit + realtime + transient-budget reset), re-queuing each
  failed task. Scoped to `run_failure` only — clarification/risk/standoff
  pauses are deliberate human gates, not failures. Idempotent + best-effort;
  fires one coarse engine.changed.
- POST /api/v2/projects/:projectId/goals/:goalId/retry — mirrors the resume
  route (auth, UUID-guard, no-leak 404), passing the actor as resolvedBy.
- GoalDetailData gains `stuck.failedTasks` (count of open run_failure pauses
  under the goal), computed in the one shared serializer so page + GET route
  agree. The detail surface shows an "Autopilot: stuck" badge + a Retry button
  (only when stuck), and suppresses the "scoping" spinner while stuck.

No agent-config update: goal resume/retry are not agent-exposed tools.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…just run_failure

Review rework: the goal-level Retry resumed ONLY open `run_failure` pauses, so a
goal stuck on a `deliverable_incomplete` pause stayed stuck and showed no Retry
button. The acceptance criterion requires resuming every open pause of kind
run_failure OR deliverable_incomplete.

- retryGoalFailures (pause.ts): filter `kind: { in: [run_failure,
  deliverable_incomplete] }`. resolveTaskTransition already re-queues
  deliverable_incomplete identically, so the resume path Just Works once selected.
- buildGoalDetailData (goal-detail.ts): count both kinds for the `stuck.failedTasks`
  signal so the Retry button shows/enables when the goal's only open pauses are
  deliverable_incomplete.
- Sync doc comments + STUCK_BADGE/route/client copy to both failure kinds.
- Tests: assert the two-kind filter; add deliverable_incomplete resume coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant