feat(tasks): reviewer loop — independent verification on in_review (status-as-role) by Fullstop000 · Pull Request #206 · Fullstop000/Chorus

Fullstop000 · 2026-05-23T10:37:20Z

What

Adds independent verification to the autonomous task loop: a second agent (the reviewer) consumes a task in in_review and drives it to done (approve) or back to in_progress (send back, with notes), escalating to the human via dispatch_decision when it can't cleanly decide. This makes the single-agent loop trustworthy — agents are structurally bad at knowing when they've failed, so the reviewer is the second identity whose only power is judgment.

Design — status-as-role (no schema migration, no new route, no new MCP tool)

A task in in_review whose claim (claimed_by_id) is held by an agent other than its durable assignee_agent_id is under review. The role is the status. Assigning an agent to an in_review task (the existing assign route) = a review dispatch; the reviewer drives the existing claimer-gated update_task_status.

Three live states, distinguished everywhere (claimed_by_id vs assignee_agent_id):

working — in_progress, claimer == assignee
awaiting reviewer — in_review, claimer == assignee (no actor expected; lease must not fire)
reviewing — in_review, claimer != assignee (reviewer; lease applies)

Mechanics: ClaimMode::Review (claim without status change) · self-review rejected, review-assignment human-only · send-back rebounds the claim + sole membership to the worker inline (status event built from the final row) · lease generalized from assignee → claimer · reconcile re-pokes the claimer with role-appropriate text and is snapshot-guarded against list-then-act races · InReview → Blocked edge for reviewer-stall escalation · TaskInfo.reviewer_name + UI "in review by @Reviewer" shown only in the reviewing state.

Review

Three independent codex passes caught 5 real bugs, all fixed + regression-tested:

self-claimed tasks broke review (orphaned claim on send-back)
reviewer stalls blamed the worker
a human-claimer 500 in update_task_status (caught by the e2e suite)
changing reviewers corrupted the durable worker
(+ two stale LSP false-alarms correctly dismissed against cargo)

Dogfood (the real bar) — verified against the DB, not self-reported

Live run with real LLM agents (claude/sonnet; kimi token had expired). Every status_changed was authored by an agent (update_task_status is claimer-gated, so the human cannot produce them); the only human-authored events are review_requested (the assign action). Confirmed in SQLite:

Full send-back loop: worker→in_review, reviewer→in_review→in_progress (claim rebounds to worker), worker re-submits, reviewer→done.
Reviewer-stall → blocked correctly blamed the reviewer; awaiting-reviewer tasks were NOT escalated (three-state lease); dead-worker task also auto-blocked.

Gates

cargo fmt · cargo clippy --all-targets -D warnings · cargo test (837 passed, incl. e2e) · cd ui && npx tsc --noEmit · vitest (139 passed). 15 commits, +~1450/−46, 13 files.

🤖 Generated with Claude Code

…alation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… caller arrives next) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When assign_task is called on a task that is already in_review, instead of forcing in_progress it dispatches the target agent as the REVIEWER: uses ClaimMode::Review (no status change), keeps the original worker as durable assignee, makes the reviewer sole agent sub-channel member, posts a review directive, and emits a ReviewRequested event. Human-only; worker cannot self-review. Removes #[allow(dead_code)] from ClaimMode::Review now that it has a caller. Adds three covering tests (happy path, self-review rejection, agent-assigner rejection). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…vent) When a reviewer calls update_task_status in_review→in_progress, the claim is handed back to the durable worker (set_claim_inner Assign), the reviewer is evicted from sub-channel membership, a send-back directive is posted into the sub-channel to wake the worker, and the StatusChanged event reflects the FINAL row claimedBy (the worker name, not the reviewer's). Existing Done archive path is unchanged. Two new tests cover approve and send-back. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace the assignee-keyed lease renewal UPDATE with a claimer-keyed one (claimed_by_id + claimed_by_type). During in_progress the claimer IS the assignee so behavior is unchanged; during in_review the claimer is the reviewer, so the reviewer's posts now renew the lease while the worker's posts no longer do. Test: reviewer_post_renews_lease_nonclaimer_does_not. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

update_task_status resolved the post-rebound claimer name always as an agent, which 500'd when the claimer is a human (operator-driven status changes). Track final_claimer_type and resolve with it. Caught by the e2e suite (test_task_board_e2e, task_lifecycle_*), which the reviewer unit tests missed because they only use agent claimers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ded) StaleTask gains status/claimed_by_id/claimed_by_type fields. list_stale_assigned_tasks now includes reviewing tasks (in_review, claimer != assignee) while excluding awaiting-reviewer tasks (in_review, claimer == assignee). redispatch_task is snapshot-guarded and branches directive text by state, re-poking the claimer. escalate_task is snapshot-guarded and derives prevStatus from the snapshot. Four tests cover the three-state predicate, branched directive, prevStatus, and the no-op guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds `reviewer_name: Option<String>` to `TaskInfo`, populated via a trailing SQL CASE column in all three task-read queries. Returns the reviewer agent's display name ONLY when status = in_review AND claimed_by_id != assignee_agent_id (the "reviewing" state); NULL in every other state including "awaiting reviewer". Column appended last so no existing index shifts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Documents and locks the existing update_task_status claimer gate: after a reviewer is dispatched (assign_task in_review branch), neither the human assigner nor the original worker may drive in_review → in_progress because neither holds the claim. No production change — the gate was already in place; this test makes the invariant explicit and regression-proof. Audit finding: the only SQL paths that write in_progress are set_claim_inner (Claim mode, CAS-guarded todo→in_progress only) and update_task_status (claimer-gated). escalate_task only goes to blocked; redispatch_task only bumps attempts. No admin/CLI/UI bypass exists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The assign handler's assign_task call was using internal_err (500) for all unexpected errors. Three review-branch violations — self-review, human-only assigner, and review-claim-did-not-apply — are client errors (caller got the precondition wrong), so they map to 409 Conflict. Matches the existing done-task 409 guard's style: check the error string for the known substrings and route to app_err!(CONFLICT) before falling through to internal_err for genuinely unexpected store errors. Route: POST /api/conversations/{channel_id}/tasks/{task_number}/assign Body field: agentId Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@name

Add `review_requested` to `TaskEventAction` union and VALID_ACTIONS. Add `reviewerName?: string` to `TaskInfo`. Show 'in review by @name' in TaskDetailView and TaskCard when reviewerName is set; omit when absent. Add exhaustive switch arm in `formatEvent` ("requested review"). Tests cover parse, rendering present/absent reviewerName, and the new event label. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…(codex review) Two P2s from the end-of-branch codex review: - Self-claimed tasks have assignee_agent_id=NULL; the review branch keyed the worker off the durable assignee, so self-review wasn't caught and a send-back stranded the task on the reviewer (orphaned claim). Derive the worker from the current claimer (covers assigned + self-claimed) and promote it to the durable assignee on review dispatch. - escalate_task blamed the durable worker for a reviewer stall. Blame the actual staller (reviewer when reviewing, worker otherwise). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-review) The prior fix derived the worker from the current claimer, which on a reviewer change is the OUTGOING reviewer, not the worker — corrupting the durable assignee and breaking send-back/self-review. Prefer the existing durable assignee when present; fall back to the claimer only for a self-claimed task's first review dispatch. Regression test added. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- CHANGELOG: reviewer loop (#206) - VERSION: 0.17.0 -> 0.18.0 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ference) (#207) * fix(decisions): resolve dispatch_decision agent by id, not name handle_create_decision looked up the agent with get_agent(name) while the bridge addresses the callback as /internal/agent/{agent_id}/decisions with the agent's UUID id (agents are keyed by id since #142). The id never matched the name column, so every real agent's dispatch_decision 500'd with "agent not found" — the resume path in the same file already used get_agent_by_id. Switch the create path to get_agent_by_id. The existing test masked it: its fixture seeds an agent whose name == id ("bot1"), so the by-name lookup happened to match. Added a regression test with a real UUID id != name (verified it 404s without the fix). Found by dogfooding the reviewer loop's escalation path (#206): a reviewer agent correctly chose to escalate via dispatch_decision and hit this. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(decisions): infer decision channel from durable state, not dead trace lookup dispatch_decision inferred its inbox channel from `lifecycle.run_channel_id`, backed by AgentTraceStore's in-memory agent-scoped channel. But `set_run_channel` had no non-test caller, so the lookup always returned None and every escalation 400'd ("no active-run channel"). The Decision Inbox never worked in production; tests passed only because the mock lifecycle set the channel by hand. Bug #1 (agent-by-name) failed earlier in the same handler and masked this. The channel is display context (inbox label) only — delivery is agent-scoped (resume_with_prompt), scoping is by workspace_id — so resolve it from durable reads: claimed-task sub-channel, else current-run channel, else last chat channel, else an honest 400. Task work drives status without a run-stamped chat, so the claimed task's sub-channel is the reliable origin for a task agent that escalates before chatting (the reviewer-loop case the dogfood hit). Delete the dead set_run_channel / run_channel_id from the AgentLifecycle trait, AgentManager, and the test mock — the trap that hid this. Add a regression test (decision_channel_resolves_from_claimed_task) exercising the production path with no mock channel; rework the existing decision tests off set_run_channel. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: prep for v0.18.1 preview ship - CHANGELOG: dispatch_decision now reaches the Decision Inbox (#207) - VERSION: 0.18.0 -> 0.18.1 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Fullstop000 and others added 16 commits May 23, 2026 14:55

feat(tasks): allow InReview -> Blocked transition (reviewer-stall esc…

5867875

…alation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(tasks): ReviewRequested task event

b661a34

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(tasks): ClaimMode::Review (claim without status change)

b9e6a63

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(tasks): suppress dead_code lint on ClaimMode::Review (groundwork,…

b530ef1

… caller arrives next) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: prep for v0.18.0 preview ship

00f6d5e

- CHANGELOG: reviewer loop (#206) - VERSION: 0.17.0 -> 0.18.0 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Fullstop000 merged commit d0cf64c into main May 23, 2026
3 checks passed

This was referenced May 23, 2026

fix(decisions): make dispatch_decision work (agent-by-id + channel inference) #207

Merged

First-level TASKS sidebar destination (agent-work triage surface) #208

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tasks): reviewer loop — independent verification on in_review (status-as-role)#206

feat(tasks): reviewer loop — independent verification on in_review (status-as-role)#206
Fullstop000 merged 16 commits into
mainfrom
feat/reviewer-loop

Fullstop000 commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Fullstop000 commented May 23, 2026

What

Design — status-as-role (no schema migration, no new route, no new MCP tool)

Review

Dogfood (the real bar) — verified against the DB, not self-reported

Gates

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant