Skip to content

Add worker tool-use loop (constrained inner ReAct, V1)#21

Merged
fxspeiser merged 1 commit into
mainfrom
feature/worker-tool-loop
May 26, 2026
Merged

Add worker tool-use loop (constrained inner ReAct, V1)#21
fxspeiser merged 1 commit into
mainfrom
feature/worker-tool-loop

Conversation

@fxspeiser
Copy link
Copy Markdown
Owner

Summary

Workers can request `fetch` or `verify` mid-turn via:

```
<tool_call>{"name": "fetch", "args": {"url": "https://..."}}</tool_call>
```

The server intercepts the tag, executes against the same session's policies (fetch allowlist, egress budget, breakers), wraps the result in `<tool_result><untrusted_input>...</untrusted_input></tool_result>`, and re-prompts. The Anthropic critic flagged attack-surface expansion; V1 is scoped tight.

Hard constraints:

  • Allowlist = `{fetch, verify}` only. No `solve` (sandboxed shell), no LLM-spawning tools (no recursive ReAct).
  • Hop budget = 2 per worker turn. 3rd request gets a structured refusal; worker gets one final round to produce an answer.
  • Opt-in via `worker_tools: [...]` on `tool_confer`. Default = no tool use, no behavior change. Accepted/rejected split echoed back on the response.
  • Untrusted-input wrap + `_neutralize_injection` on every inner result. Existing canary leak detection on the final response covers nonce leaks from inner fetches.
  • Coordinate NOT wired in V1: its JSON-schema-validated role envelopes don't compose cleanly with mid-turn tool tags. Revisit if requested.

Test plan

  • New `scripts/test_worker_tools.py` covers: tag parsing (happy / no-tag / malformed), allowlist gates (including every LLM-spawning tool name), happy 1-hop path, 2-hop + budget exhaustion + final round, malformed envelope recovery, empty worker_tools identity behavior, `tool_confer` surface with accepted/rejected split
  • Full suite (33 scripts) passes locally

🤖 Generated with Claude Code

Lets a worker emit `<tool_call>{"name": "TOOL", "args": {...}}</tool_call>`
mid-turn to fetch evidence or run a deterministic check; the server
intercepts the call, executes against the same session's policies,
wraps the result as untrusted-input, and re-prompts. Hard limits keep
the attack surface narrow.

Scope (V1, tight on purpose):
- HARD allowlist of inner tools: `fetch` and `verify` only. Both are
  read-only / deterministic. `solve` (sandboxed shell), `coordinate`,
  `audit`, `delegate`, `create*`, and every other LLM-spawning tool
  are explicitly rejected — no recursive ReAct.
- HARD hop budget: at most 2 inner calls per worker turn. A 3rd request
  gets a structured refusal payload re-prompted to the worker, which
  then gets one final round to produce an answer.
- Opt-in: callers pass `worker_tools: ["fetch", "verify"]` on `confer`.
  Default = no tool use (existing behavior unchanged). Coordinate is
  NOT wired — its role envelopes are JSON-schema-validated and don't
  compose cleanly with mid-turn tool tags; revisit if needed.
- Untrusted-input wrap: every tool result is wrapped in
  `<tool_result name="..."><untrusted_input>...</untrusted_input></tool_result>`
  with `_neutralize_injection` applied. Canary leak detection on the
  final response covers any nonce that leaks via inner fetches.
- Telemetry: every inner call emits a `worker_inner_call` event under
  the same session_id; aggregate usage across re-prompts is summed
  into the answer's `usage` block.

Implementation:
- `_extract_tool_call(text)` -> (parsed_dict | None, error_msg | None).
  Tag-present-but-broken-JSON returns a parse_error string so the
  caller can surface a structured refusal back to the worker.
- `_worker_tools_dispatch(call, session_id)` -> wrapped tool_result
  string. Enforces the allowlist; refusals also come back as
  `<tool_result>{"refused": true, ...}</tool_result>` so the worker
  sees a consistent envelope regardless of outcome.
- `_ask_one_with_tools(provider, messages, deadline, max_tokens,
  purpose, worker_tools, session_id)` runs the loop. Returns the same
  answer shape as `_ask_one` with usage/timing summed across hops, plus
  `inner_tool_calls: [{hop, name, status}]` for visibility.
- `_ask_many_parallel` grows optional `worker_tools` + `session_id`
  kwargs; passes them through to per-worker dispatch.
- `tool_confer` accepts a new `worker_tools: [...]` arg. The list is
  filtered to the allowlist; the accepted/rejected split is surfaced
  back on the response as `worker_tools: {accepted, rejected, hop_budget}`
  so callers can verify what's actually enabled.

Tests (scripts/test_worker_tools.py):
- `_extract_tool_call`: parses valid envelope, returns (None,None) on
  no-tag, returns (None,error) on tag-present-but-broken-JSON
- Allowlist gates: `coordinate`, `solve`, and every other LLM-spawning
  tool name returns a refusal payload (loop over the FULL tool surface)
- Happy path: one fetch call -> wrapped tool_result re-prompted -> final
  answer; usage summed across both hops; fetched URL was real
- Hop budget: 3 consecutive requests -> first 2 execute, 3rd refused
  with `hop_budget_exhausted`; worker gets one more round to finalize
- Malformed JSON in tool_call: parse_error refusal, worker can correct
  on next hop
- Empty worker_tools list: identity behavior, no system hint injected,
  no `inner_tool_calls` field on the answer
- `tool_confer` surface: `worker_tools` arg filtered against allowlist,
  accepted/rejected split echoed back on the response

Full suite (33 scripts) passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@fxspeiser fxspeiser merged commit 882e6d2 into main May 26, 2026
1 check passed
@fxspeiser fxspeiser deleted the feature/worker-tool-loop branch May 26, 2026 13:26
fxspeiser added a commit that referenced this pull request May 26, 2026
Closes the follow-up logged in PR #21: coordinate now supports the
bounded inner-ReAct loop, but only on the roles where evidence
gathering is meaningful.

Design choice: tool calls happen BEFORE the structured JSON emission.
A `<tool_call>` tag in the middle of a role envelope would break
schema validation, so `_request_structured_with_tools` interleaves
the hop loop with schema parsing — tool calls each cost one hop;
the response without a tool_call tag is parsed against the schema
(with the existing retry-once-on-validation-failure semantics).

Roles:
- PROPOSER + CRITICS get worker_tools when enabled. They benefit from
  fetching evidence to ground their position.
- SYNTHESIZER is intentionally EXCLUDED. Its job is purely
  combinatorial — combining proposer + critique output. Letting it
  call tools opens scope creep and external-call cost on what should
  be a pure reduction step.

Implementation:
- `_request_structured` grows kwargs `worker_tools` + `session_id`.
  When `worker_tools` is non-empty (after allowlist filtering) it
  delegates to `_request_structured_with_tools`.
- `_request_structured_with_tools` interleaves the hop loop with
  schema parsing:
  * tool_call tag present + within budget -> dispatch, wrap, re-prompt
  * tool_call tag present + budget exhausted -> refusal + one final
    emission round; that final attempt is parsed once, no retry
  * no tool_call tag -> attempt JSON parse + schema validation; on
    failure, re-prompt once with the validation errors (the standard
    `_request_structured` retry path)
- The system message hint enforces "tool_call envelopes only BEFORE
  the final JSON object" so the worker doesn't try to mix them.
- `tool_coordinate` accepts `worker_tools: [...]` arg, filters against
  the hard allowlist, and surfaces `worker_tools: {accepted, rejected,
  hop_budget, applies_to: ["proposer", "critic"]}` back on the
  response — operators see explicitly that synth is excluded.
- Tool calls roll up under the same session_id (cost, breakers, fetch
  egress budget all apply to inner calls).

Schema:
- `coordinate.input.worker_tools` added with `enum: ["fetch","verify"]`
  and a note that synth is excluded.

Tests (scripts/test_coordinate_worker_tools.py):
- `_request_structured_with_tools` happy path (one tool_call -> valid
  envelope)
- Hop budget exhaustion inside the structured loop -> 3rd request
  refused, worker still produces a valid envelope on the final round
- worker_tools empty -> identity behavior (no tool-call hint injected;
  no `inner_tool_calls` on the answer)
- Schema validation retry preserved on the tool-use path
- End-to-end coordinate: proposer + both critics each fetch once; synth
  dispatch does NOT carry the tool-call hint in its system prompt;
  inner_tool_calls recorded on per-role answers
- Legacy coordinate (no worker_tools arg): no metadata on the response,
  no extra prompt scaffolding, no fetches

Full suite (35 scripts) passes.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
fxspeiser added a commit that referenced this pull request May 26, 2026
Closes the PR #21 follow-up. The cap is per-worker-turn (matches the
hop-budget scope) and defaults to soft-warn behavior so the calling
agent can ask the user how to proceed when the cap is hit.

Three modes:
- warn (default when a cap is set): emit a `worker_tool_cost_warning`
  ndjson event the first time observed > cap, attach a `cost_cap`
  block to the answer, and KEEP RUNNING. The response also carries an
  `operator_prompt` string so the calling agent has a clear signal to
  surface an AskUserQuestion offering enforce / warn-only / ignore.
- enforce: when observed > cap, the next inner tool_call is refused
  with a `cost_cap_exceeded` payload (wrapped as a tool_result so the
  worker sees a consistent envelope). Worker gets one final emission
  round to produce its answer with whatever it already has.
- off: no checks at all; `cost_cap` block is suppressed entirely
  (the operator asked to ignore the cap, so the response stays clean).

Scope decision: per-turn, not session-wide. Each top-level
confer/coordinate call gets a fresh cap budget, matching how the
hop budget already works.

Implementation:
- `_worker_tool_cost_cap_defaults(kwarg_cap, kwarg_mode)` resolves
  per-call args against CFG.worker_tools.{cost_cap_usd,cost_cap_mode}.
  cap_usd <= 0 disables; unknown mode falls through to "warn".
- `_worker_tool_cost_observed(aggregated)` pulls cumulative
  $-cost out of the merged answer's usage block.
- `_worker_tool_cost_cap_refusal(observed, cap)` formats the
  enforce-mode refusal payload (wrapped as a `<tool_result>` so the
  worker sees the same envelope shape regardless of outcome).
- `_ask_one_with_tools` grows `cost_cap_usd` + `cost_cap_mode`
  kwargs. Pre-dispatch check: if enforce + observed > cap, refuse +
  one final emission round. Post-call check (warn mode): emit the
  warning event the first time we observe > cap.
- `_request_structured_with_tools` mirrors the same logic and uses
  a `_finalize()` helper to consistently attach the cost_cap block
  on every return path.
- `_request_structured` + `_ask_many_parallel` forward the new
  kwargs through.
- `tool_confer` + `tool_coordinate` accept
  `worker_tool_cost_cap_usd` + `worker_tool_cost_cap_mode` args.
  Both surface aggregated cost-cap state on the response under
  `worker_tools.cost_cap` (with `per_provider` / `per_role` breakdown
  + an `operator_prompt` when soft-warn was tripped).

Schema additions: `worker_tool_cost_cap_usd` (number, minimum 0) and
`worker_tool_cost_cap_mode` (enum) on both confer.input and
coordinate.input; descriptions document the warn-default + the agent's
AskUserQuestion follow-up responsibility.

Tests (scripts/test_worker_tool_cost_cap.py):
- `_worker_tool_cost_cap_defaults`: CFG fallthrough, per-call override,
  0/negative disables, unknown mode falls through to warn
- warn mode: both inner fetches execute, cost_cap block attached with
  exceeded=true / blocked=false
- enforce mode: only first fetch executes; second gets
  cost_cap_exceeded; final emission round runs
- off mode: no cost_cap block on the answer (caller asked to ignore)
- No cap set: legacy worker_tools behavior preserved (no block)
- Cap not exceeded: block still attached with exceeded=false
- Structured variant: same behavior inside
  `_request_structured_with_tools`
- End-to-end coordinate: cost_cap surfaces on `worker_tools.cost_cap`
  with per_role breakdown; synth role NOT in the per_role list
  (excluded from worker_tools by design); operator_prompt present
  when warn-mode tripped

Full suite (36 scripts) passes.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
fxspeiser added a commit that referenced this pull request May 26, 2026
Closes the PR #21 follow-up. Today the gateway only enforces the
allowlist; bad args fell through to the inner tool which returned its
own ad-hoc error envelope. Workers got inconsistent refusal shapes
across rejection reasons (allowlist vs. schema vs. hop budget vs. cost
cap) and the inner tool burned code paths handling garbage.

The fix:
- Look up the inner tool's `inputSchema` from `TOOLS[name]` and run
  the existing boundary validator `_validate_input` BEFORE dispatch.
- On validation failure, return a refusal payload carrying a
  structured `schema_error` field with the validator message — the
  worker can read it programmatically and self-correct on the next
  hop (within the remaining hop budget).
- Emit a `worker_inner_validation_fail` ndjson event so operators can
  see when a worker is repeatedly misusing inner tools.

Validation order is: allowlist -> input schema -> session_id injection
-> dispatch. session_id is added AFTER the schema check so a missing
session_id arg never trips the validator (it's always optional on the
inner tools).

API change:
- `_worker_tools_refusal` grows an optional `schema_error` keyword
  argument; the payload gains a `schema_error` field when set.
- New helper `_worker_tool_input_schema(name)` for the lookup (defensive
  against module-load order: returns empty dict if TOOLS isn't bound yet).

Tests (scripts/test_worker_tool_arg_validation.py):
- `fetch` with no `url`: refusal with schema_error mentioning the missing field
- `fetch.url` wrong type: refusal mentioning type + field
- `verify` missing `checks`: refusal
- `verify.checks` wrong type: refusal mentioning array
- Unknown extra arg on `fetch` (additionalProperties=false): refusal
- VALID `fetch` args dispatch normally with session_id injected
- VALID `verify` args dispatch normally
- Allowlist refusal still works AND has no `schema_error` field (it
  short-circuits before the schema check)
- End-to-end recovery: worker emits invalid call, gets refusal with
  schema_error, recovers on next hop with valid args; only the valid
  call reaches the inner tool

Full suite (37 scripts) passes.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant