Skip to content

feat(api): RunSupervisor advise rung for the shadow observation-signal rules#294

Merged
xmap merged 4 commits into
mainfrom
worktree-supervisor-advise-rung
Jun 22, 2026
Merged

feat(api): RunSupervisor advise rung for the shadow observation-signal rules#294
xmap merged 4 commits into
mainfrom
worktree-supervisor-advise-rung

Conversation

@xmap

@xmap xmap commented Jun 21, 2026

Copy link
Copy Markdown
Owner

What

Promotes the RunSupervisor's three shadow observe-only rules one rung on the autonomy ladder: observe -> advise. When run_supervisor_advise_enabled is on (default off), each rule records one Decision per breach edge for a human, and still issues no command.

Rule Advise Decision
run-age run-liveness backstop (#273) SupervisionQuieted
Rule R beam-aware rate-dropout (#288) SupervisionStalled
Rule Q quality-below-limit (#288) SupervisionBreached

Three commits: (A) the Decision-BC vocab (7 -> 10 choices + vocab test), (B) the supervisor emission + config, (C) gate-review test additions.

Why

The shadow rules log would_flag but leave no durable record a human can triage. The advise rung records a Decision(context=RunSupervision, choice=...) per breach episode under the supervisor's identity + Authorize path, while keeping the act rung (auto-Hold) deferred. It climbs exactly one rung — no command is issued from these rules.

Trust posture (verified by the gate review)

  • Off by default, a further opt-in above each rule's own enable + run_supervisor_enabled.
  • Decision-only, never a command at the advise rung.
  • Edge-triggered: one Decision per breach episode (off the already-walled per-rule memory); a standing breach across ticks does not re-emit; cannot-tell paths still defer (no Decision).
  • Beam-free emitter for the liveness rule (it runs before the beam read); no shared-memory bleed into the beam-Hold FSM.

Naming

SupervisionBreached is the naming-r3 rename of the originally-proposed SupervisionDoubted — "Doubted" read as the supervisor's epistemic state; "Breached" names the objective limit-crossing, family-uniform with SupervisionDeferred / Conflicted / Stalled. Both design memos were updated in lockstep.

Gate review

Focused 3-lens review on the diff: correctness/trust = ship (all 5 trust invariants sound), cross-BC/vocab = ship, test/fitness = changes-needed (a test-coverage gap, no correctness bug). Addressed in commit C: liveness edge-trigger test, plus two cannot-tell-under-advise tests (no Decision when the channel has no observation, and when the rule is disabled). A reviewer's worry that a value=None Decision could emit was verified false — the decider returns would_flag=False on a None reading; commit C pins that.

Deferred

The act rung (reversible auto-Hold on a confirmed breach) + the act-mode sim composition guard, per the design lock.

Test plan

Unit: each disposition emits exactly one Decision under advise-on with no command; advise-off records nothing; edge-triggering for all three rules; cannot-tell gates. Decision-BC vocab parity (closed-set == Literal == 10). Full suite + architecture fitness green on every code commit.

🤖 Generated with Claude Code

xmap and others added 3 commits June 21, 2026 22:52
Slice A of the observation-signal advise rung. Adds SupervisionQuieted
(run-age liveness backstop), SupervisionStalled (Rule R rate-dropout), and
SupervisionBreached (Rule Q quality-below-limit) to the RunSupervisionChoice
Literal + RUN_SUPERVISION_CHOICES frozenset (7 -> 10), with the vocab test
updated to the 10-value set + a work-noun guard on the new dispositions.

WHY: promoting the shipped shadow observation-signal + run-liveness rules
one rung (observe -> advise) means the supervisor records one Decision per
breach edge for a human; that Decision's choice must exist in the closed
set first. Decision-only dispositions (never a command). SupervisionBreached
is the naming-r3 rename of the originally-proposed SupervisionDoubted:
"Doubted" read as the supervisor's epistemic state; "Breached" names the
objective limit-crossing, family-uniform with Deferred / Conflicted /
Stalled. This slice adds vocabulary only; the supervisor emission lands next.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slice B of the observation-signal advise rung. Adds run_supervisor_advise_enabled
(default off, a further opt-in above each rule's own enable) and, when on, emits
exactly one Decision per breach EDGE from the three shadow rules -- still issuing
NO command (advise rung):
  - run-liveness backstop  -> SupervisionQuieted
  - Rule R rate-dropout    -> SupervisionStalled
  - Rule Q quality breach  -> SupervisionBreached

WHY: the shadow rules (#288 / #273) log would_flag but leave no durable record a
human can triage. The advise rung climbs exactly one step (observe -> advise),
recording one RunSupervision Decision per breach episode for a human while keeping
the act rung (auto-Hold) deferred. Emission is edge-triggered off the already-walled
per-rule memory (one Decision per episode; nothing on a standing breach across
ticks), beam-free (the liveness rule runs before the beam read), and reuses the
existing DecisionRegistered shape under the RunSupervisor identity + Authorize path.
Shadow logging is unchanged; advise only adds the Decision. cannot-tell still
defers (no Decision). Tests cover advise-off (no Decision), each disposition under
advise-on (one Decision, no command), and edge-triggering (one Decision across two
ticks of a standing breach).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Gate-review follow-ups (the advise diff drew 2 ship + 1 changes_needed, the
last purely a test-coverage gap; the correctness/trust lens passed clean).
Adds three tests:
  - advise liveness is edge-triggered: two ticks of a standing stale Run
    record only ONE SupervisionQuieted Decision (parity with the quality +
    stall edge-trigger tests).
  - advise records no Decision when the quality channel has no observation
    (cannot-tell -> defer; pins that the value-None path never emits, which a
    reviewer worried about -- the decider returns would_flag=False on None).
  - advise records no Decision when the rule is disabled (snr_limit None):
    advise respects each rule's own enable, not just the global advise flag.

Test-only; no production change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  apps/api/src/cora/api
  _run_supervisor.py 1010
  apps/api/src/cora/decision/aggregates/decision
  state.py
  apps/api/src/cora/infrastructure
  config.py
Project Total  

This report was generated by python-coverage-comment-action

The diff-coverage gate (hard 90% on changed lines) flagged
_run_supervisor.py at 88.9%: the new _record_supervision_advice except
ConcurrencyError branch (lines 490-491) was uncovered. Adds an idempotency
test that re-derives the same advise Decision id (via a FixedIdGenerator
repeating the id) so the second append collides and is swallowed -- mirrors
the existing test_record_decision_is_idempotent_on_repeated_id for the
beam-Hold path. Test-only; covers the cross-restart re-emission no-op.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@xmap xmap merged commit c71ef08 into main Jun 22, 2026
16 checks passed
@xmap xmap deleted the worktree-supervisor-advise-rung branch June 22, 2026 04:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant