feat(llm): support parallel tool calls#40
Conversation
- LlmResponse.tool_call (Option) replaced by tool_calls (Vec) — empty means final answer, N entries means parallel tool use - ConversationTurn::AssistantToolCall now carries calls: Vec<ToolCall> instead of a single id/name/arguments triple - EntryContent::ToolCall stores calls: Vec<MemoryToolCall> so a whole parallel batch is saved as one assistant memory entry - Anthropic client: disable_parallel_tool_use removed; parse_blocks collects all ToolUse blocks; streaming accumulates per block index via BTreeMap; consecutive ToolResult turns merged into one user message (required by the API for parallel results) - OpenAI client: parallel_tool_calls field removed (API defaults to enabled); complete() and streaming accumulator both collect all calls - Agent loop dispatches all tool calls concurrently via join_all, storing one ToolCall batch entry and one ToolResult entry per result - futures crate added to workspace for join_all
Confidence Score: 4/5Safe to merge with the streaming error-handling fix; the P1 only surfaces on malformed server-sent JSON in the OpenAI streaming path. One P1 (silent tool-call drop on parse error in OpenAI streaming) caps the score at 4. All other changes are correct and well-tested, including the Anthropic path, memory model, and agent loop. crates/llm/src/openai.rs — accumulated_response function around line 262 Sequence DiagramsequenceDiagram
participant AL as AgentLoop
participant LLM as LlmClient
participant MEM as SessionMemory
participant T1 as Tool[0]
participant T2 as Tool[1]
AL->>LLM: complete(turns, tools)
LLM-->>AL: LlmResponse { tool_calls: [TC1, TC2] }
AL->>MEM: add_entry(ToolCall { calls: [TC1, TC2], reasoning })
par Concurrent dispatch
AL->>T1: dispatch(TC1.name, TC1.args)
AL->>T2: dispatch(TC2.name, TC2.args)
end
T1-->>AL: result1
T2-->>AL: result2
AL->>MEM: add_entry(ToolResult { tool_call_id: TC1.id, content: result1 })
AL->>MEM: add_entry(ToolResult { tool_call_id: TC2.id, content: result2 })
Note over AL,MEM: On next turn, Anthropic merges consecutive ToolResult turns into one user message
Reviews (1): Last reviewed commit: "feat(llm): support parallel tool calls a..." | Re-trigger Greptile |
| let arguments = if args_json.is_empty() { | ||
| serde_json::json!({}) | ||
| } else { | ||
| serde_json::from_str(&args_json).ok()? |
There was a problem hiding this comment.
Silent error swallowing on malformed streamed tool-call JSON
.ok()? inside filter_map converts a parse failure to None, causing the tool call to be silently dropped from tool_calls with no error and no log warning. The Anthropic streaming path (lines 435–436 in anthropic.rs) correctly uses .with_context(...)? on a map(...).collect::<Result<_>>()? chain, which propagates the error to the caller. A bad JSON chunk here will produce a response that appears to have fewer tool calls than the model intended, leading to silent data loss.
The fix requires switching from filter_map to map and collecting as Result<Vec<_>> to mirror the Anthropic pattern:
let tool_calls = accumulator
.tools
.into_values()
.filter_map(|(id, name, args_json)| match (id, name) {
(Some(id), Some(name)) => {
let arguments = if args_json.is_empty() {
serde_json::json!({})
} else {
serde_json::from_str(&args_json)
.context("parsing streamed tool call arguments")?
};
Some(Ok(ToolCall { id, name, arguments }))
}
_ => None,
})
.collect::<Result<Vec<_>>>()?;There was a problem hiding this comment.
Pull request overview
Enables end-to-end support for parallel tool calls across the LLM clients, agent loop, and session memory, removing the provider-side flags that previously forced tool calls to be serialized.
Changes:
- Update core types to represent tool calls as batches (
Vec<ToolCall>/Vec<MemoryToolCall>) instead of single calls. - Teach OpenAI + Anthropic clients to serialize/parse multiple tool calls, including streaming accumulation.
- Dispatch tool calls concurrently in the agent loop and store one ToolCall batch entry plus per-call ToolResult entries (with Anthropic ToolResult message merging support).
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/memory/src/lib.rs | Store tool calls as a batch in memory via calls: Vec<MemoryToolCall>. |
| crates/llm/src/lib.rs | Change LlmResponse.tool_call → tool_calls: Vec<ToolCall> and update final-answer logic. |
| crates/llm/src/openai.rs | Remove parallel_tool_calls suppression; add multi-tool-call (incl. streaming) support. |
| crates/llm/src/anthropic.rs | Remove disable_parallel_tool_use; support multiple tool_use blocks and merge consecutive ToolResult turns per API requirements. |
| crates/llm/tests/stub_tests.rs | Update tests for tool_calls semantics and serialization expectations. |
| crates/agent-loop/src/lib.rs | Dispatch tool calls concurrently (join_all) and store batch ToolCall + individual ToolResults. |
| crates/agent-loop/tests/agent_loop_tests.rs | Add coverage for parallel tool call dispatch + memory storage layout. |
| crates/agent-loop/Cargo.toml | Add futures dependency for concurrent dispatch. |
| Cargo.toml | Add workspace dependency on futures. |
| Cargo.lock | Lockfile updates for futures and related crates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for tool_call in tool_calls { | ||
| // Track and filter by the first tool call's index so parallel | ||
| // tool calls don't contaminate tool_arguments with a second | ||
| // call's JSON, which would produce invalid concatenated JSON. | ||
| if let Some(idx) = tool_call.index { | ||
| match accumulator.tool_block_index { | ||
| None => accumulator.tool_block_index = Some(idx), | ||
| Some(first) if idx != first => { | ||
| warn!( | ||
| first_index = first, | ||
| skipped_index = idx, | ||
| "ignoring parallel tool call delta" | ||
| ); | ||
| continue; | ||
| } | ||
| _ => {} | ||
| } | ||
| } else if accumulator.tool_block_index.is_some() { | ||
| // index absent but we already captured a call — treat as belonging to it | ||
| } | ||
| let idx = tool_call.index.unwrap_or(0); | ||
| let entry = accumulator | ||
| .tools | ||
| .entry(idx) | ||
| .or_insert((None, None, String::new())); |
| (Some(id), Some(name)) => { | ||
| let arguments = if args_json.is_empty() { | ||
| serde_json::json!({}) | ||
| } else { | ||
| serde_json::from_str(&args_json).ok()? | ||
| }; | ||
| Some(ToolCall { | ||
| id, | ||
| name, | ||
| arguments, | ||
| }) | ||
| } | ||
| _ => None, | ||
| }) | ||
| .collect(); | ||
|
|
Summary
Both providers were permanently suppressing parallel tool calls
(
disable_parallel_tool_use: true/parallel_tool_calls: false), whichkept the implementation correct but prevented the model from issuing
independent tool calls in a single turn. This removes those flags and wires
up the full stack to handle parallel calls properly.
LlmResponse.tool_callsis now aVec(empty = final answer, N = calls);ConversationTurn::AssistantToolCallcarriescalls: Vec<ToolCall>;EntryContent::ToolCallstores the whole batch as one memory entry.The Anthropic client merges consecutive
ToolResultturns into a singleuser message (required by the API). The agent loop dispatches all calls
concurrently via
futures::join_alland stores individual result entries.Closes #39