Skip to content

Flywheel is blind to the failures that actually happen (outcome taxonomy gap) #2

Description

@n1arash

Summary

foreman retro over the entire campaign drafted 0 proposals — not because runs were clean, but because the outcome taxonomy labelled 38 of 43 runs legacy/blank, including every one of the 21 killed_turns failures. Retro only clusters escalated / evaluator_bounce / human_rejected, so it saw just two single-instance escalation clusters and reasonably declined to patch. The flywheel is blind to the dominant failure.

Evidence (dogfood campaign)

  • Outcome distribution: success_first_try 5, blank/legacy (unlabelled) 38 (88%).
  • killed_turns (21 runs, 49% of all runs) is never assigned a failure outcome.
  • foreman retro log: "No patch proposals were drafted." It only clustered [1×] escalated:(unspecified) and [1×] escalated:handoff.

Why it matters

This is the central Phase-2 finding. The flywheel can only improve what it can measure, and it isn't measuring the dominant failure. An operator would conclude "the system is fine, retro found nothing" — which is false.

Proposed fix

  • (a) Stamp an outcome on every terminal run, including phase agents (planner / grill / slicer) and the kill reasons (killed_turns / killed_cost / killed_timeout / error). No more blank/legacy.
  • (b) Make retro.cluster_failures cluster on terminal_reason kills, not just escalations.
  • (c) Treat a high killed_turns rate as a first-class proposal trigger (it would have proposed the turn-budget fix directly).

Acceptance criteria

  • Every terminal run (incl. planner/grill/slicer) carries a non-blank outcome label.
  • Kill reasons are first-class outcomes in the taxonomy.
  • retro.cluster_failures clusters on kill reasons and surfaces a cluster for killed_turns.
  • A high kill-rate cluster produces a drafted proposal.
  • Regression test: a fixture of killed runs yields a non-empty retro proposal set.

Source: dogfood/ITERATION_REPORT.md BLOCKER-2; dogfood/FLYWHEEL.md; dogfood/METRICS.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:flywheelRetro / learning loop / skill patchingblockerCritical — blocks the core value propositionbugSomething isn't workingdogfoodSurfaced by the self-driving dogfood campaign

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions