Skip to content

refactor(agent): extract the shared flag-only-watcher scaffold#308

Merged
xmap merged 1 commit into
mainfrom
worktree-flag-watcher-scaffold
Jun 22, 2026
Merged

refactor(agent): extract the shared flag-only-watcher scaffold#308
xmap merged 1 commit into
mainfrom
worktree-flag-watcher-scaffold

Conversation

@xmap

@xmap xmap commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Summary

The three deterministic flag-only watchers (ClearanceWatcher, CalibrationWatcher, ProcedureWatcher) had fired the rule-of-three: each was a structural near-clone carrying the same agent-invariant mechanics. This hoists those into a new cora.api._flag_watcher module so the next flag-only watcher is a thin consumer, not a fourth copy.

The scaffold owns:

  • is_stalled — the pure staleness comparison (inclusive >= boundary).
  • derive_watcher_decision_id — the per-episode deterministic id (uuid5(namespace, "decision:{entity_id}:{episode_at}")) that makes a re-flag of the same stall episode a ConcurrencyError no-op.
  • record_watcher_decision — the DecisionRegistered envelope + idempotent append.
  • flag_watcher_lifespan — the off-by-default gate, the periodic loop (a failed tick is logged and retried, cancellation propagates), and task teardown.

Each watcher keeps what genuinely differs per agent: its drain (list query + status filter), its recency fold (clearance UnderReview review-step; procedure Running activity recency; calibration none), its clock source, its Decision vocabulary, and its namespace UUID — plus a thin per-agent _record_decision / _derive_decision_id / is_stalled surface delegating to the scaffold.

Behavior preservation

This is a behavior-preserving refactor. All 56 existing watcher unit tests pass verbatim (none were touched) — the deterministic Decision ids (per-agent namespaces ffff0002 / ca110002 / 0c0c0002), the full envelope (decided_by, context/choice/rule, verbatim reasoning, identical inputs dicts, confidence_source, event_id, append at expected_version=0 with the ConcurrencyError swallow), the gates, the folds, and the off-by-default lifespan are all unchanged. The only cosmetic difference is the flagged log line keying entity_id= instead of <entity>_id= (unasserted).

Gate-reviewed (behavior-preservation diff vs origin/main for all three watchers, naming-r3, seam + coverage): ship-with-nits, no P0/P1. The naming nit is applied — the shared helpers are record_watcher_decision / derive_watcher_decision_id, not *_flag_*, because a Decision's choice can literally be Flag (owned by ClearanceProgress), so "flag" must not modify "decision" (it stays the agent-family adjective in _flag_watcher / flag_watcher_lifespan).

Tests

test_flag_watcher.py pins the loop's cancel-propagation contract (an in-flight tick is cancelled cleanly on lifespan exit) — the one scaffold line the per-watcher suites do not reach. Local: ruff, pyright, tach, architecture (26,895), full unit tier (10,465) all green.

This is the first of two: a follow-up PR adds CampaignWatcher as the scaffold's first new consumer.

🤖 Generated with Claude Code

The three deterministic flag-only watchers (ClearanceWatcher, CalibrationWatcher,
ProcedureWatcher) had fired the rule-of-three: each was a near-clone carrying the
same agent-invariant mechanics. This hoists those into a new
cora.api._flag_watcher module:

- is_stalled: the pure staleness comparison (inclusive >= boundary).
- derive_watcher_decision_id: the per-episode deterministic id
  (uuid5(namespace, "decision:{entity_id}:{episode_at}")) that makes a re-flag of
  the same stall episode a ConcurrencyError no-op.
- record_watcher_decision: the DecisionRegistered envelope + idempotent append.
- flag_watcher_lifespan: the off-by-default gate, the periodic loop (a failed
  tick is logged and retried, cancellation propagates), and task teardown.

Each watcher keeps what genuinely differs per agent: its drain (which list query
+ status filter), its recency fold (clearance UnderReview review-step;
procedure Running activity recency; calibration none), its clock source, its
Decision vocabulary, and its namespace UUID. Each also keeps a thin per-agent
_record_decision / _derive_decision_id / is_stalled surface delegating to the
scaffold, so behavior (and every existing test) is unchanged: all 56 watcher
unit tests pass verbatim, the behavior-preservation proof.

naming: the shared envelope/id helpers are record_watcher_decision /
derive_watcher_decision_id, not *_flag_* -- "flag" is the agent-family adjective
(flag-only watcher), but a Decision's choice can literally be "Flag" (owned by
ClearanceProgress), so it must not modify "decision". A new test_flag_watcher.py
pins the loop's cancel-propagation contract.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  apps/api/src/cora/api
  _calibration_watcher.py
  _clearance_watcher.py
  _flag_watcher.py
  _procedure_watcher.py
Project Total  

This report was generated by python-coverage-comment-action

@xmap xmap merged commit 8273721 into main Jun 22, 2026
16 checks passed
@xmap xmap deleted the worktree-flag-watcher-scaffold branch June 22, 2026 10:11
xmap added a commit that referenced this pull request Jun 22, 2026
…#310)

CORA's 9th seeded agent and the first new consumer of the shared
cora.api._flag_watcher scaffold (PR #308). A deterministic, flag-only,
composition-root periodic watcher: each tick it lists Held campaigns
(operator-paused) and records one Decision(context=CampaignProgress,
choice=Stuck) per stuck episode for any whose last_status_changed_at (the time
it was held) has sat past an operator window without being resumed or closed.
It issues no command (it surfaces the forgotten pause so a human resumes or
closes the campaign). Off by default; gates on Actor.active.

On the scaffold it is a thin module: the staleness rule, the per-episode
Decision id, the DecisionRegistered envelope, and the loop/lifespan come from
_flag_watcher; this module owns only the Held drain, the campaign vocabulary,
and the namespace. The simplest consumer yet: no activity fold needed, because
Held makes no run-execution progress (last_status_changed_at, advanced only by
resume/close, is the true clock; membership curation touches only run_count). A
defensive status==Held re-check guards a future filter widening.

naming-r3: context CampaignProgress (family-clean with ClearanceProgress /
ProcedureProgress); choice Stuck -- the ideation's proposed "reuse Stall" would
have collided (Stall is owned by ProcedureProgress, and choice tokens must be
globally unique in the DecisionChoice projection), so this context owns its own
token. Agent kind CampaignWatcher; agent id in a new cab1 block.

No migration: proj_campaign_summary already carries last_status_changed_at +
admits Held, and list_campaigns already filters by status. v1 watches Held
only; Planned (legitimately not-started-yet) is deferred to a later variant.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant