Skip to content

Reviewer calibration gaps: aggregate over-decomposition + strictness tunable #8

Description

@n1arash

Summary

The queue reviewer and plan reviewer have calibration gaps. The queue reviewer judges each slice's coherence but not aggregate over-decomposition, and plan-reviewer strictness is load-bearing but not exposed as a tunable.

⚠️ The reviewer in the dogfood run was a synthetic LLM judge, not a human. These are real calibration signals, but human review DX is deferred to an attended run.

Evidence (dogfood campaign)

  • The queue reviewer approved the over-sliced F5 queue (5 issues + scope creep) — it never asks "is this the right number of slices for the request?"
  • The plan reviewer, at full strictness, bounced a sound trivial plan 3× on style/truncation nits; recalibrating to "approve sound, request-changes only for substantive defects" then approved correctly. Calibration is load-bearing.
  • F2 failed doc_review because a mandated request-changes + a demanding judge never converged in 2 cycles — the revise→review loop can diverge with a cheap grill model.

Proposed fix

  • Add an aggregate over-decomposition check to the queue rubric ("right number of slices for the request?").
  • Expose reviewer strictness (demanding vs demanding-but-fair) as an operator-tunable.
  • Guard the revise→review loop against divergence (cap cycles / detect non-convergence with cheap models).

Acceptance criteria

  • Queue rubric includes an aggregate-slice-count check and flags over-decomposition.
  • Reviewer strictness is a documented, operator-settable knob.
  • The revise→review loop has a non-convergence guard rather than silently failing after N cycles.

Source: dogfood/ITERATION_REPORT.md MINOR-7; dogfood/AUTOREVIEW_LOG.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:agentsWorker / planner / grill / slicer agent qualityarea:pipelineScheduler / gates / decomposition pipelinedogfoodSurfaced by the self-driving dogfood campaignenhancementNew feature or requestminorMinor — polish, cosmetic, or low-impact

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions