feat: implement act apply-model and report baseline tripwire (#607)#616
Open
ozymandiashh wants to merge 2 commits into
Open
feat: implement act apply-model and report baseline tripwire (#607)#616ozymandiashh wants to merge 2 commits into
ozymandiashh wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the design-gated #607 path for model default recommendations from real compare data.
This PR adds the no-proxy version of model routing: CodeBurn can now notice when a cheaper same-provider model has matched the current dominant model's edit reliability for a project, show the evidence in
compareandoptimize, and let the user explicitly apply that recommendation withcodeburn act apply-model <project>.The implementation intentionally stays conservative:
effortLevelrecommendation in v1Closes #607.
Why this exists
The acting-layer epic deliberately avoided live routing and provider interception. #607 is the safe middle ground: if local history already shows that a project can use a cheaper model without losing edit reliability, CodeBurn can recommend a project-level default model change.
That keeps the decision reviewable and reversible. The user still controls the session and can override with
--modelwhen a particular task needs a different model.Design followed
This PR follows the thresholds and rollout described in the #607 design thread:
modelsetting.effortLevelis intentionally deferred because compare does not have per-effort evidence yet.What changed
Recommendation engine
Adds
src/act/model-defaults.ts.The engine consumes the existing compare stats pipeline and produces one recommendation per qualifying project. Each recommendation includes the evidence reviewers need to judge it:
The engine refuses to recommend when any guardrail is missing, including insufficient volume, stale observations, cost not actually lower, reliability below threshold, or provider mismatch.
Explicit apply command
Adds:
The command:
<project>/.claude/settings.jsonmodelkeyexpectedHashwhen the settings file already existsrunAction()askind: model-defaultThis command is intentionally explicit. It is not wired into
optimize --apply --yes.Compare and optimize surfaces
Adds low-key recommendation blocks in both places where users already inspect model and waste evidence:
codeburn comparecodeburn optimizeThese blocks are informational. They show the evidence and point to
codeburn act apply-model <project>instead of mutating anything.Act report integration
Extends
codeburn act reportformodel-defaultactions.Model default rows are not token or dollar claims. They are correlation-only quality checks:
quality regression, consider undoif post-apply one-shot rate drops more than 5 percentage pointscorrelation, not attributionThis keeps the accounting honest. We do not claim savings from a model-default change in v1 because the causal story is weaker than for config-token removals.
Reviewer guide
Suggested review order:
tests/act-model-defaults.test.tssrc/act/model-defaults.tssrc/act/cli.tsact apply-model <project>command.src/compare.tsxandsrc/optimize.tssrc/act/report.tsSafety model
This PR keeps the acting-layer invariants intact:
expectedHashmodel, nevereffortLeveloptimize --apply --yesdoes not apply model defaultsFiles changed
src/act/model-defaults.tssrc/act/cli.tscodeburn act apply-model <project>.src/act/report.tssrc/compare.tsxsrc/optimize.tstests/act-model-defaults.test.tsVerification
Local verification on branch
feat/issue-607-model-defaults:Result: clean.
Result: 8 tests passing.
Result: successful build, including dashboard build.
Result: 1500 of 1503 tests passing locally.
The 3 failing tests reproduce on
main, so they are not introduced by this PR:tests/cli-status-menubar.test.ts, 2 failures onmaintests/cli-proxy-path.test.ts, 1 failure onmainAlso verified before opening the PR:
._*files remainorigin/feat/issue-607-model-defaultsNotes for maintainers
The main thing to scrutinize is not whether the command works, but whether the recommendation should exist at all under borderline data. The implementation is deliberately biased toward silence. If volume, reliability, recency, cost, provider family, or debugging-heavy safety is unclear, it returns no recommendation.
That is intentional. A missing recommendation is much cheaper than a bad model default.