feat: implement act apply-model and report baseline tripwire (#607) by ozymandiashh · Pull Request #616 · getagentseal/codeburn

ozymandiashh · 2026-07-03T22:58:41Z

Summary

Implements the design-gated #607 path for model default recommendations from real compare data.

This PR adds the no-proxy version of model routing: CodeBurn can now notice when a cheaper same-provider model has matched the current dominant model's edit reliability for a project, show the evidence in compare and optimize, and let the user explicitly apply that recommendation with codeburn act apply-model <project>.

The implementation intentionally stays conservative:

no per-turn routing
no request rewriting
no automatic apply
no effortLevel recommendation in v1
no cross-provider recommendations
every write goes through the existing action journal and can be undone

Closes #607.

Why this exists

The acting-layer epic deliberately avoided live routing and provider interception. #607 is the safe middle ground: if local history already shows that a project can use a cheaper model without losing edit reliability, CodeBurn can recommend a project-level default model change.

That keeps the decision reviewable and reversible. The user still controls the session and can override with --model when a particular task needs a different model.

Design followed

This PR follows the thresholds and rollout described in the #607 design thread:

Compare only models with at least 30 edit turns in the same project.
Recommend the candidate only if its one-shot rate is within 3 percentage points of the current dominant model.
Require candidate cost per edit turn to be at most 60 percent of the current model's cost per edit turn.
For debugging-heavy projects, defined as more than 40 percent Debugging edit turns, remove the tolerance and require the candidate to meet or beat the current one-shot rate.
Require both models to have been observed in the project within the last 14 days.
Keep v1 same-provider only.
Write only the Claude Code model setting. effortLevel is intentionally deferred because compare does not have per-effort evidence yet.

What changed

Recommendation engine

Adds src/act/model-defaults.ts.

The engine consumes the existing compare stats pipeline and produces one recommendation per qualifying project. Each recommendation includes the evidence reviewers need to judge it:

project path and display name
current dominant model
candidate model
provider
edit turn counts for both models
one-shot rates for both models
cost per edit turn for both models
cost ratio
debugging-heavy flag

The engine refuses to recommend when any guardrail is missing, including insufficient volume, stale observations, cost not actually lower, reliability below threshold, or provider mismatch.

Explicit apply command

Adds:

codeburn act apply-model <project>

The command:

resolves the project from parsed local sessions
recomputes the recommendation at apply time
writes <project>/.claude/settings.json
sets only the model key
preserves the rest of the JSON object
stores an expectedHash when the settings file already exists
writes through runAction() as kind: model-default
prints the evidence line
prints the undo command
prints the per-session override hint

This command is intentionally explicit. It is not wired into optimize --apply --yes.

Compare and optimize surfaces

Adds low-key recommendation blocks in both places where users already inspect model and waste evidence:

codeburn compare
codeburn optimize

These blocks are informational. They show the evidence and point to codeburn act apply-model <project> instead of mutating anything.

Act report integration

Extends codeburn act report for model-default actions.

Model default rows are not token or dollar claims. They are correlation-only quality checks:

capture the candidate model's pre-apply one-shot baseline
remeasure the same model after apply
require at least 20 post-apply edit turns before reporting
flag quality regression, consider undo if post-apply one-shot rate drops more than 5 percentage points
otherwise report correlation, not attribution

This keeps the accounting honest. We do not claim savings from a model-default change in v1 because the causal story is weaker than for config-token removals.

Reviewer guide

Suggested review order:

tests/act-model-defaults.test.ts
- This captures the intended behavior first: qualifying recommendation, insufficient volume, one-shot guard, debugging guard, stale models, provider mismatch, apply, and undo.
src/act/model-defaults.ts
- Main recommendation logic and apply-plan builder.
- Check the threshold constants and the expectedHash behavior.
src/act/cli.ts
- New act apply-model <project> command.
- Verify that apply recomputes the recommendation and stays explicit.
src/compare.tsx and src/optimize.ts
- Informational recommendation blocks only.
- No automatic mutation path.
src/act/report.ts
- Correlation-only model-default reporting and quality tripwire.

Safety model

This PR keeps the acting-layer invariants intact:

passive analyzer behavior is unchanged unless the user runs the explicit apply command
no hook, proxy, or request-path component is added
model-default changes are journaled
undo restores through the existing action framework
stale settings files are protected by expectedHash
recommendations require local project evidence, not cross-project borrowing
v1 only writes model, never effortLevel
v1 is same-provider only
optimize --apply --yes does not apply model defaults

Files changed

src/act/model-defaults.ts
- New recommendation engine and apply-plan builder.
src/act/cli.ts
- Adds codeburn act apply-model <project>.
src/act/report.ts
- Adds model-default report rows and the one-shot quality regression tripwire.
src/compare.tsx
- Adds recommendation block to compare output.
src/optimize.ts
- Adds recommendation block to optimize output.
tests/act-model-defaults.test.ts
- Adds focused coverage for the recommendation engine, guardrails, apply, and undo.

Verification

Local verification on branch feat/issue-607-model-defaults:

./node_modules/.bin/tsc --noEmit

Result: clean.

./node_modules/.bin/vitest run tests/act-model-defaults.test.ts

Result: 8 tests passing.

npm run build

Result: successful build, including dashboard build.

./node_modules/.bin/vitest run

Result: 1500 of 1503 tests passing locally.

The 3 failing tests reproduce on main, so they are not introduced by this PR:

tests/cli-status-menubar.test.ts, 2 failures on main
tests/cli-proxy-path.test.ts, 1 failure on main

Also verified before opening the PR:

final diff contains only the 6 intended source/test files
no AppleDouble ._* files remain
generated data files are not included in the diff
branch pushed cleanly to origin/feat/issue-607-model-defaults

Notes for maintainers

The main thing to scrutinize is not whether the command works, but whether the recommendation should exist at all under borderline data. The implementation is deliberately biased toward silence. If volume, reliability, recency, cost, provider family, or debugging-heavy safety is unclear, it returns no recommendation.

That is intentional. A missing recommendation is much cheaper than a bad model default.

ozymandiashh added 2 commits July 4, 2026 01:10

feat: implement act apply-model and report baseline tripwire (#607)

ee0ed31

feat: implement act apply-model and report baseline tripwire (#607)

34ca31e

ozymandiashh requested a review from iamtoruk July 3, 2026 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement act apply-model and report baseline tripwire (#607)#616

feat: implement act apply-model and report baseline tripwire (#607)#616
ozymandiashh wants to merge 2 commits into
mainfrom
feat/issue-607-model-defaults

ozymandiashh commented Jul 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ozymandiashh commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this exists

Design followed

What changed

Recommendation engine

Explicit apply command

Compare and optimize surfaces

Act report integration

Reviewer guide

Safety model

Files changed

Verification

Notes for maintainers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ozymandiashh commented Jul 3, 2026 •

edited

Loading