Skip to content

Per-turn worker tool-use cost cap (soft default + 3 modes)#24

Merged
fxspeiser merged 1 commit into
mainfrom
feature/worker-tool-cost-cap
May 26, 2026
Merged

Per-turn worker tool-use cost cap (soft default + 3 modes)#24
fxspeiser merged 1 commit into
mainfrom
feature/worker-tool-cost-cap

Conversation

@fxspeiser
Copy link
Copy Markdown
Owner

Summary

Per-worker-turn cost cap for the inner-ReAct loop. Defaults to soft-warn so the agent can ask the user how to proceed when the cap is hit (matching the design we discussed).

Three modes:

  • `warn` (default when a cap is set): emit `worker_tool_cost_warning` event the first time observed > cap, attach a `cost_cap` block to the answer with an `operator_prompt` field, keep running. The agent's signal to surface AskUserQuestion offering enforce / warn-only / ignore.
  • `enforce`: when observed > cap, the next inner `<tool_call>` gets refused with a `cost_cap_exceeded` payload. Worker gets one final emission round.
  • `off`: no checks; `cost_cap` block suppressed entirely (operator asked to ignore the cap).

Scope: per-turn, not session-wide. Matches how the hop budget already works.

Hooked through: `_ask_one_with_tools`, `_request_structured_with_tools` (uses a `_finalize()` helper to attach the cap block on every return path), `_request_structured`, `_ask_many_parallel`, `tool_confer`, `tool_coordinate`. CFG fallback via `worker_tools.cost_cap_usd` + `worker_tools.cost_cap_mode`.

Test plan

  • New `scripts/test_worker_tool_cost_cap.py` covers defaults helper, warn / enforce / off / no-cap / under-cap, structured variant, end-to-end coordinate with per-role breakdown (synth correctly excluded)
  • Full suite (36 scripts) passes locally

🤖 Generated with Claude Code

Closes the PR #21 follow-up. The cap is per-worker-turn (matches the
hop-budget scope) and defaults to soft-warn behavior so the calling
agent can ask the user how to proceed when the cap is hit.

Three modes:
- warn (default when a cap is set): emit a `worker_tool_cost_warning`
  ndjson event the first time observed > cap, attach a `cost_cap`
  block to the answer, and KEEP RUNNING. The response also carries an
  `operator_prompt` string so the calling agent has a clear signal to
  surface an AskUserQuestion offering enforce / warn-only / ignore.
- enforce: when observed > cap, the next inner tool_call is refused
  with a `cost_cap_exceeded` payload (wrapped as a tool_result so the
  worker sees a consistent envelope). Worker gets one final emission
  round to produce its answer with whatever it already has.
- off: no checks at all; `cost_cap` block is suppressed entirely
  (the operator asked to ignore the cap, so the response stays clean).

Scope decision: per-turn, not session-wide. Each top-level
confer/coordinate call gets a fresh cap budget, matching how the
hop budget already works.

Implementation:
- `_worker_tool_cost_cap_defaults(kwarg_cap, kwarg_mode)` resolves
  per-call args against CFG.worker_tools.{cost_cap_usd,cost_cap_mode}.
  cap_usd <= 0 disables; unknown mode falls through to "warn".
- `_worker_tool_cost_observed(aggregated)` pulls cumulative
  $-cost out of the merged answer's usage block.
- `_worker_tool_cost_cap_refusal(observed, cap)` formats the
  enforce-mode refusal payload (wrapped as a `<tool_result>` so the
  worker sees the same envelope shape regardless of outcome).
- `_ask_one_with_tools` grows `cost_cap_usd` + `cost_cap_mode`
  kwargs. Pre-dispatch check: if enforce + observed > cap, refuse +
  one final emission round. Post-call check (warn mode): emit the
  warning event the first time we observe > cap.
- `_request_structured_with_tools` mirrors the same logic and uses
  a `_finalize()` helper to consistently attach the cost_cap block
  on every return path.
- `_request_structured` + `_ask_many_parallel` forward the new
  kwargs through.
- `tool_confer` + `tool_coordinate` accept
  `worker_tool_cost_cap_usd` + `worker_tool_cost_cap_mode` args.
  Both surface aggregated cost-cap state on the response under
  `worker_tools.cost_cap` (with `per_provider` / `per_role` breakdown
  + an `operator_prompt` when soft-warn was tripped).

Schema additions: `worker_tool_cost_cap_usd` (number, minimum 0) and
`worker_tool_cost_cap_mode` (enum) on both confer.input and
coordinate.input; descriptions document the warn-default + the agent's
AskUserQuestion follow-up responsibility.

Tests (scripts/test_worker_tool_cost_cap.py):
- `_worker_tool_cost_cap_defaults`: CFG fallthrough, per-call override,
  0/negative disables, unknown mode falls through to warn
- warn mode: both inner fetches execute, cost_cap block attached with
  exceeded=true / blocked=false
- enforce mode: only first fetch executes; second gets
  cost_cap_exceeded; final emission round runs
- off mode: no cost_cap block on the answer (caller asked to ignore)
- No cap set: legacy worker_tools behavior preserved (no block)
- Cap not exceeded: block still attached with exceeded=false
- Structured variant: same behavior inside
  `_request_structured_with_tools`
- End-to-end coordinate: cost_cap surfaces on `worker_tools.cost_cap`
  with per_role breakdown; synth role NOT in the per_role list
  (excluded from worker_tools by design); operator_prompt present
  when warn-mode tripped

Full suite (36 scripts) passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@fxspeiser fxspeiser merged commit 3b34377 into main May 26, 2026
1 check passed
@fxspeiser fxspeiser deleted the feature/worker-tool-cost-cap branch May 26, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant