feat(tasks): reviewer loop — independent verification on in_review (status-as-role)#206
Merged
Conversation
…alation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… caller arrives next) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When assign_task is called on a task that is already in_review, instead of forcing in_progress it dispatches the target agent as the REVIEWER: uses ClaimMode::Review (no status change), keeps the original worker as durable assignee, makes the reviewer sole agent sub-channel member, posts a review directive, and emits a ReviewRequested event. Human-only; worker cannot self-review. Removes #[allow(dead_code)] from ClaimMode::Review now that it has a caller. Adds three covering tests (happy path, self-review rejection, agent-assigner rejection). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vent) When a reviewer calls update_task_status in_review→in_progress, the claim is handed back to the durable worker (set_claim_inner Assign), the reviewer is evicted from sub-channel membership, a send-back directive is posted into the sub-channel to wake the worker, and the StatusChanged event reflects the FINAL row claimedBy (the worker name, not the reviewer's). Existing Done archive path is unchanged. Two new tests cover approve and send-back. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the assignee-keyed lease renewal UPDATE with a claimer-keyed one (claimed_by_id + claimed_by_type). During in_progress the claimer IS the assignee so behavior is unchanged; during in_review the claimer is the reviewer, so the reviewer's posts now renew the lease while the worker's posts no longer do. Test: reviewer_post_renews_lease_nonclaimer_does_not. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
update_task_status resolved the post-rebound claimer name always as an agent, which 500'd when the claimer is a human (operator-driven status changes). Track final_claimer_type and resolve with it. Caught by the e2e suite (test_task_board_e2e, task_lifecycle_*), which the reviewer unit tests missed because they only use agent claimers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ded) StaleTask gains status/claimed_by_id/claimed_by_type fields. list_stale_assigned_tasks now includes reviewing tasks (in_review, claimer != assignee) while excluding awaiting-reviewer tasks (in_review, claimer == assignee). redispatch_task is snapshot-guarded and branches directive text by state, re-poking the claimer. escalate_task is snapshot-guarded and derives prevStatus from the snapshot. Four tests cover the three-state predicate, branched directive, prevStatus, and the no-op guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds `reviewer_name: Option<String>` to `TaskInfo`, populated via a trailing SQL CASE column in all three task-read queries. Returns the reviewer agent's display name ONLY when status = in_review AND claimed_by_id != assignee_agent_id (the "reviewing" state); NULL in every other state including "awaiting reviewer". Column appended last so no existing index shifts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents and locks the existing update_task_status claimer gate: after a reviewer is dispatched (assign_task in_review branch), neither the human assigner nor the original worker may drive in_review → in_progress because neither holds the claim. No production change — the gate was already in place; this test makes the invariant explicit and regression-proof. Audit finding: the only SQL paths that write in_progress are set_claim_inner (Claim mode, CAS-guarded todo→in_progress only) and update_task_status (claimer-gated). escalate_task only goes to blocked; redispatch_task only bumps attempts. No admin/CLI/UI bypass exists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The assign handler's assign_task call was using internal_err (500) for
all unexpected errors. Three review-branch violations — self-review,
human-only assigner, and review-claim-did-not-apply — are client errors
(caller got the precondition wrong), so they map to 409 Conflict.
Matches the existing done-task 409 guard's style: check the error string
for the known substrings and route to app_err!(CONFLICT) before falling
through to internal_err for genuinely unexpected store errors.
Route: POST /api/conversations/{channel_id}/tasks/{task_number}/assign
Body field: agentId
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `review_requested` to `TaskEventAction` union and VALID_ACTIONS. Add `reviewerName?: string` to `TaskInfo`. Show 'in review by @name' in TaskDetailView and TaskCard when reviewerName is set; omit when absent. Add exhaustive switch arm in `formatEvent` ("requested review"). Tests cover parse, rendering present/absent reviewerName, and the new event label. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…(codex review) Two P2s from the end-of-branch codex review: - Self-claimed tasks have assignee_agent_id=NULL; the review branch keyed the worker off the durable assignee, so self-review wasn't caught and a send-back stranded the task on the reviewer (orphaned claim). Derive the worker from the current claimer (covers assigned + self-claimed) and promote it to the durable assignee on review dispatch. - escalate_task blamed the durable worker for a reviewer stall. Blame the actual staller (reviewer when reviewing, worker otherwise). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-review) The prior fix derived the worker from the current claimer, which on a reviewer change is the OUTGOING reviewer, not the worker — corrupting the durable assignee and breaking send-back/self-review. Prefer the existing durable assignee when present; fall back to the claimer only for a self-claimed task's first review dispatch. Regression test added. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- CHANGELOG: reviewer loop (#206) - VERSION: 0.17.0 -> 0.18.0 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 23, 2026
Fullstop000
added a commit
that referenced
this pull request
May 23, 2026
…ference) (#207) * fix(decisions): resolve dispatch_decision agent by id, not name handle_create_decision looked up the agent with get_agent(name) while the bridge addresses the callback as /internal/agent/{agent_id}/decisions with the agent's UUID id (agents are keyed by id since #142). The id never matched the name column, so every real agent's dispatch_decision 500'd with "agent not found" — the resume path in the same file already used get_agent_by_id. Switch the create path to get_agent_by_id. The existing test masked it: its fixture seeds an agent whose name == id ("bot1"), so the by-name lookup happened to match. Added a regression test with a real UUID id != name (verified it 404s without the fix). Found by dogfooding the reviewer loop's escalation path (#206): a reviewer agent correctly chose to escalate via dispatch_decision and hit this. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(decisions): infer decision channel from durable state, not dead trace lookup dispatch_decision inferred its inbox channel from `lifecycle.run_channel_id`, backed by AgentTraceStore's in-memory agent-scoped channel. But `set_run_channel` had no non-test caller, so the lookup always returned None and every escalation 400'd ("no active-run channel"). The Decision Inbox never worked in production; tests passed only because the mock lifecycle set the channel by hand. Bug #1 (agent-by-name) failed earlier in the same handler and masked this. The channel is display context (inbox label) only — delivery is agent-scoped (resume_with_prompt), scoping is by workspace_id — so resolve it from durable reads: claimed-task sub-channel, else current-run channel, else last chat channel, else an honest 400. Task work drives status without a run-stamped chat, so the claimed task's sub-channel is the reliable origin for a task agent that escalates before chatting (the reviewer-loop case the dogfood hit). Delete the dead set_run_channel / run_channel_id from the AgentLifecycle trait, AgentManager, and the test mock — the trap that hid this. Add a regression test (decision_channel_resolves_from_claimed_task) exercising the production path with no mock channel; rework the existing decision tests off set_run_channel. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: prep for v0.18.1 preview ship - CHANGELOG: dispatch_decision now reaches the Decision Inbox (#207) - VERSION: 0.18.0 -> 0.18.1 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds independent verification to the autonomous task loop: a second agent (the reviewer) consumes a task in
in_reviewand drives it todone(approve) or back toin_progress(send back, with notes), escalating to the human viadispatch_decisionwhen it can't cleanly decide. This makes the single-agent loop trustworthy — agents are structurally bad at knowing when they've failed, so the reviewer is the second identity whose only power is judgment.Design — status-as-role (no schema migration, no new route, no new MCP tool)
A task in
in_reviewwhose claim (claimed_by_id) is held by an agent other than its durableassignee_agent_idis under review. The role is the status. Assigning an agent to anin_reviewtask (the existing assign route) = a review dispatch; the reviewer drives the existing claimer-gatedupdate_task_status.Three live states, distinguished everywhere (
claimed_by_idvsassignee_agent_id):in_progress, claimer == assigneein_review, claimer == assignee (no actor expected; lease must not fire)in_review, claimer != assignee (reviewer; lease applies)Mechanics:
ClaimMode::Review(claim without status change) · self-review rejected, review-assignment human-only · send-back rebounds the claim + sole membership to the worker inline (status event built from the final row) · lease generalized from assignee → claimer · reconcile re-pokes the claimer with role-appropriate text and is snapshot-guarded against list-then-act races ·InReview → Blockededge for reviewer-stall escalation ·TaskInfo.reviewer_name+ UI "in review by @Reviewer" shown only in the reviewing state.Review
Three independent codex passes caught 5 real bugs, all fixed + regression-tested:
update_task_status(caught by the e2e suite)Dogfood (the real bar) — verified against the DB, not self-reported
Live run with real LLM agents (claude/sonnet; kimi token had expired). Every
status_changedwas authored by an agent (update_task_statusis claimer-gated, so the human cannot produce them); the only human-authored events arereview_requested(the assign action). Confirmed in SQLite:in_review, reviewer→in_review→in_progress(claim rebounds to worker), worker re-submits, reviewer→done.blockedcorrectly blamed the reviewer; awaiting-reviewer tasks were NOT escalated (three-state lease); dead-worker task also auto-blocked.Gates
cargo fmt·cargo clippy --all-targets -D warnings·cargo test(837 passed, incl. e2e) ·cd ui && npx tsc --noEmit·vitest(139 passed). 15 commits, +~1450/−46, 13 files.🤖 Generated with Claude Code