Per-turn worker tool-use cost cap (soft default + 3 modes)#24
Merged
Conversation
Closes the PR #21 follow-up. The cap is per-worker-turn (matches the hop-budget scope) and defaults to soft-warn behavior so the calling agent can ask the user how to proceed when the cap is hit. Three modes: - warn (default when a cap is set): emit a `worker_tool_cost_warning` ndjson event the first time observed > cap, attach a `cost_cap` block to the answer, and KEEP RUNNING. The response also carries an `operator_prompt` string so the calling agent has a clear signal to surface an AskUserQuestion offering enforce / warn-only / ignore. - enforce: when observed > cap, the next inner tool_call is refused with a `cost_cap_exceeded` payload (wrapped as a tool_result so the worker sees a consistent envelope). Worker gets one final emission round to produce its answer with whatever it already has. - off: no checks at all; `cost_cap` block is suppressed entirely (the operator asked to ignore the cap, so the response stays clean). Scope decision: per-turn, not session-wide. Each top-level confer/coordinate call gets a fresh cap budget, matching how the hop budget already works. Implementation: - `_worker_tool_cost_cap_defaults(kwarg_cap, kwarg_mode)` resolves per-call args against CFG.worker_tools.{cost_cap_usd,cost_cap_mode}. cap_usd <= 0 disables; unknown mode falls through to "warn". - `_worker_tool_cost_observed(aggregated)` pulls cumulative $-cost out of the merged answer's usage block. - `_worker_tool_cost_cap_refusal(observed, cap)` formats the enforce-mode refusal payload (wrapped as a `<tool_result>` so the worker sees the same envelope shape regardless of outcome). - `_ask_one_with_tools` grows `cost_cap_usd` + `cost_cap_mode` kwargs. Pre-dispatch check: if enforce + observed > cap, refuse + one final emission round. Post-call check (warn mode): emit the warning event the first time we observe > cap. - `_request_structured_with_tools` mirrors the same logic and uses a `_finalize()` helper to consistently attach the cost_cap block on every return path. - `_request_structured` + `_ask_many_parallel` forward the new kwargs through. - `tool_confer` + `tool_coordinate` accept `worker_tool_cost_cap_usd` + `worker_tool_cost_cap_mode` args. Both surface aggregated cost-cap state on the response under `worker_tools.cost_cap` (with `per_provider` / `per_role` breakdown + an `operator_prompt` when soft-warn was tripped). Schema additions: `worker_tool_cost_cap_usd` (number, minimum 0) and `worker_tool_cost_cap_mode` (enum) on both confer.input and coordinate.input; descriptions document the warn-default + the agent's AskUserQuestion follow-up responsibility. Tests (scripts/test_worker_tool_cost_cap.py): - `_worker_tool_cost_cap_defaults`: CFG fallthrough, per-call override, 0/negative disables, unknown mode falls through to warn - warn mode: both inner fetches execute, cost_cap block attached with exceeded=true / blocked=false - enforce mode: only first fetch executes; second gets cost_cap_exceeded; final emission round runs - off mode: no cost_cap block on the answer (caller asked to ignore) - No cap set: legacy worker_tools behavior preserved (no block) - Cap not exceeded: block still attached with exceeded=false - Structured variant: same behavior inside `_request_structured_with_tools` - End-to-end coordinate: cost_cap surfaces on `worker_tools.cost_cap` with per_role breakdown; synth role NOT in the per_role list (excluded from worker_tools by design); operator_prompt present when warn-mode tripped Full suite (36 scripts) passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Per-worker-turn cost cap for the inner-ReAct loop. Defaults to soft-warn so the agent can ask the user how to proceed when the cap is hit (matching the design we discussed).
Three modes:
Scope: per-turn, not session-wide. Matches how the hop budget already works.
Hooked through: `_ask_one_with_tools`, `_request_structured_with_tools` (uses a `_finalize()` helper to attach the cap block on every return path), `_request_structured`, `_ask_many_parallel`, `tool_confer`, `tool_coordinate`. CFG fallback via `worker_tools.cost_cap_usd` + `worker_tools.cost_cap_mode`.
Test plan
🤖 Generated with Claude Code