Skip to content

fix: improve Responses API streaming event lifecycle and input handling#63

Open
ivanopcode wants to merge 7 commits intoteabranch:mainfrom
ivanopcode:fix/responses-api-streaming-lifecycle
Open

fix: improve Responses API streaming event lifecycle and input handling#63
ivanopcode wants to merge 7 commits intoteabranch:mainfrom
ivanopcode:fix/responses-api-streaming-lifecycle

Conversation

@ivanopcode
Copy link
Copy Markdown

@ivanopcode ivanopcode commented Apr 9, 2026

Problem

ORS did not fully support some Responses API request and event patterns used
by tool-calling clients and open-weight reasoning models such as gpt-oss.

In practice this caused three classes of problems:

  1. Input history was reconstructed incompletely.
    Clients send prior tool calls, tool results, and developer messages as
    input items on each turn. ORS only handled a subset of these items, so
    parts of the conversation history were dropped before reaching the backend.

  2. The streamed Responses event lifecycle was incomplete.
    Several expected events and state transitions were missing or inconsistent,
    especially around tool calls and text output items.

  3. reasoning_content was not preserved across tool-call turns.
    For models that emit reasoning separately from the final answer, losing that
    context degraded multi-step tool use. This is sometimes referred to in the
    community as "CoT passback".

These issues were reproduced with Codex CLI, but the fixes bring ORS closer
to the Responses API model more generally.

Changes

This MR updates the Responses adapter to:

  • convert function_call, function_call_output, and developer input items
    into the corresponding chat-completions message structure
  • emit a more complete Responses streaming lifecycle for tool calls and text
    output, including response.output_item.added,
    response.function_call_arguments.done,
    response.output_text.done, and response.output_item.done
  • use consistent item_id values across streamed tool-call events
  • replace the non-spec ready status with in_progress / completed
  • cache and reinject reasoning_content across tool-call turns when the model
    provides it

Testing

Tested with:

  • uv run pytest tests/test_responses_service.py

Also verified manually with Codex CLI against local llama.cpp-backed models,
including multi-turn tool-calling flows where prior tool calls and reasoning
need to survive across turns.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Improve Responses API streaming lifecycle and input handling

🐞 Bug fix ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Improved input handling for function_call, function_call_output, and developer message types
  in Responses API requests
• Enhanced streaming event lifecycle with response.output_item.added, response.output_item.done,
  response.output_text.done, and response.content_part.done events
• Fixed item_id consistency across streamed tool-call events (changed from id to item_id in
  ToolCallArgumentsDone)
• Replaced non-spec ready status with in_progress/completed for tool calls
• Added reasoning_content caching and reinject mechanism across tool-call turns for CoT
  preservation
Diagram
flowchart LR
  A["Input Processing"] -->|function_call| B["Convert to Assistant Message"]
  A -->|function_call_output| C["Convert to Tool Message"]
  A -->|developer| D["Convert to System Message"]
  B --> E["Cache reasoning_content"]
  C --> E
  E --> F["Stream Events"]
  F -->|output_item.added| G["Tool Call Created"]
  F -->|function_call_arguments.delta| H["Arguments Streaming"]
  H -->|function_call_arguments.done| I["Arguments Complete"]
  I -->|output_item.done| J["Tool Call Done"]
  F -->|output_text.delta| K["Text Streaming"]
  K -->|output_text.done| L["Text Complete"]
  L -->|output_item.done| M["Message Done"]
  E -->|Next Turn| N["Reinject reasoning_content"]
Loading

Grey Divider

File Changes

1. src/open_responses_server/models/responses_models.py ✨ Enhancement +19/-2

Add new event models and fix field naming

• Changed ToolCallArgumentsDone.id field to item_id for consistency with other event types
• Added three new event model classes: OutputItemAdded, OutputItemDone, and OutputTextDone to
 support complete streaming lifecycle
• Fixed trailing whitespace in ResponseCompleted class

src/open_responses_server/models/responses_models.py


2. src/open_responses_server/responses_service.py 🐞 Bug fix +289/-211

Enhance input handling and streaming event lifecycle

• Added global reasoning_content_cache dictionary to store and retrieve CoT (chain-of-thought)
 reasoning across tool-call turns
• Refactored convert_responses_to_chat_completions() to handle function_call,
 function_call_output, and developer input item types with proper message role conversion
• Enhanced process_chat_completions_stream() to emit complete event lifecycle:
 response.output_item.added, response.output_item.done, response.output_text.done,
 response.content_part.added, and response.content_part.done
• Changed tool call status from ready to completed for consistency
• Implemented reasoning_content caching keyed by call_id with LRU-style trimming to prevent
 unbounded growth
• Fixed item_id usage in tool call events to use tool_call["id"] instead of separate item_id
 field

src/open_responses_server/responses_service.py


3. tests/test_responses_service.py 🧪 Tests +16/-11

Update test expectations for new event types

• Updated test expectations for tool call status from ready to completed in two test cases
• Changed test assertion for test_tool_calls_created_event_emitted to verify
 response.output_item.added and response.output_item.done events instead of
 response.in_progress
• Updated test documentation and assertions for empty output handling to expect empty string instead
 of fallback text

tests/test_responses_service.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Apr 9, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX Issues (0)

Grey Divider


Action required

1. Tool output missing call_id🐞
Description
convert_responses_to_chat_completions reads function_call_output.call_id without a fallback and can
create tool/assistant messages with tool_call_id/id=None, which then fails matching/validation and
can drop tool outputs or produce invalid tool_calls structures.
Code

src/open_responses_server/responses_service.py[R215-257]

+                elif item_type == "function_call_output":
               call_id = item.get("call_id")
+                    output = item.get("output", "")
+                    logger.info(f"[INPUT] function_call_output: call_id={call_id} output_len={len(str(output))}")
+
+                    # Check if we have a corresponding assistant message with a matching tool call
               has_matching_tool_call = False
-                    
-                    # Look for a matching tool call in the existing messages
               for msg in messages:
                   if msg.get("role") == "assistant" and "tool_calls" in msg:
                       for tool_call in msg["tool_calls"]:
                           if tool_call.get("id") == call_id:
                               has_matching_tool_call = True
                               break
-                    
-                    # Debug: Log messages structure for debugging
-                    logger.info(f"[TOOL-OUTPUT-PROCESSING] Messages so far: {len(messages)} messages")
-                    for i, msg in enumerate(messages):
-                        logger.info(f"[TOOL-OUTPUT-PROCESSING] Message {i}: role={msg.get('role')}, has_tool_calls={'tool_calls' in msg}")
-                        if msg.get("role") == "tool":
-                            logger.info(f"[TOOL-OUTPUT-PROCESSING] Tool message {i}: call_id={msg.get('tool_call_id')}")
-                    
+
               if has_matching_tool_call:
-                        # Only add the tool response if we found a matching tool call
-                        tool_message = {
+                        messages.append({
                       "role": "tool",
                       "tool_call_id": call_id,
-                            "content": item.get("output", "")
-                        }
-                        messages.append(tool_message)
-                        logger.info(f"[TOOL-OUTPUT-PROCESSING] Added tool response for existing tool call {call_id}")
+                            "content": output
+                        })
+                        logger.info(f"[INPUT] Added tool response for call_id={call_id}")
               else:
-                        # If no matching tool call, we need to add an assistant message with the tool call first
-                        # as this could be from a previous conversation
+                        # Fallback: create synthetic assistant + tool message
                   tool_name = item.get("name", "unknown_tool")
-                        
-                        # Validate we have required fields
-                        if not tool_name or tool_name == "unknown_tool":
-                            logger.error(f"[TOOL-OUTPUT-PROCESSING] Cannot create tool call without tool name. Item: {item}")
-                            continue
-                        
-                        # Create an assistant message with a tool call
-                        assistant_message = {
-                            "role": "assistant",
-                            "content": None,
-                            "tool_calls": [{
-                                "id": call_id,
-                                "type": "function",
-                                "function": {
-                                    "name": tool_name,
-                                    "arguments": item.get("arguments", "{}")
-                                }
-                            }]
-                        }
-                        messages.append(assistant_message)
-                        
-                        # Then add the tool response
-                        tool_message = {
-                            "role": "tool",
-                            "tool_call_id": call_id,
-                            "content": item.get("output", "")
-                        }
-                        messages.append(tool_message)
-                        logger.info(f"[TOOL-OUTPUT-PROCESSING] Added assistant message with tool call and corresponding tool response for {tool_name}")
-                elif item.get("type") == "message" and item.get("role") == "assistant":
-                    # Handle assistant messages from previous conversations
-                    content = ""
-                    if "content" in item and isinstance(item["content"], list):
-                        for content_item in item["content"]:
-                            if isinstance(content_item, dict) and content_item.get("type") == "output_text":
-                                content += content_item.get("text", "")
-                    
-                    if content:
-                        messages.append({"role": "assistant", "content": content})
-                        logger.info(f"Added assistant message: {content[:100]}...")
+                        if tool_name and tool_name != "unknown_tool":
+                            messages.append({
+                                "role": "assistant",
+                                "content": None,
+                                "tool_calls": [{
+                                    "id": call_id,
+                                    "type": "function",
+                                    "function": {
+                                        "name": tool_name,
+                                        "arguments": item.get("arguments", "{}")
+                                    }
+                                }]
+                            })
+                            messages.append({
+                                "role": "tool",
+                                "tool_call_id": call_id,
+                                "content": output
+                            })
+                            logger.info(f"[INPUT] Created synthetic assistant+tool for {tool_name} call_id={call_id}")
Evidence
function_call_output handling uses call_id=item.get("call_id") with no fallback to item.id and then
uses it as both tool_call_id and tool_calls[].id in the synthetic assistant fallback;
validate_message_sequence only considers a tool message valid if it can find a preceding assistant
tool_calls entry with a matching id, so missing/mismatched IDs cause tool outputs to be treated as
orphaned and removed.

src/open_responses_server/responses_service.py[25-75]
src/open_responses_server/responses_service.py[214-260]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`function_call_output` input items can lack `call_id` (or use `id` instead). Current code sets `call_id = item.get('call_id')` and may generate tool messages / synthetic assistant tool_calls with `id=None`, which can be dropped by validation or cause invalid tool call structures.
### Issue Context
`function_call` items already use a fallback (`call_id` -> `id` -> generated). `function_call_output` should mirror this behavior.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[214-260]
- src/open_responses_server/responses_service.py[25-75]
### Suggested changes
- Derive `call_id = item.get('call_id') or item.get('id')` (and if still missing, generate `call_{uuid...}`) to ensure a non-null string.
- Ensure tool message `content` is a string (e.g., JSON-dump dict/list outputs) before passing into chat-completions.
- In the synthetic assistant+tool fallback, only create tool_calls with a valid string id; otherwise skip and log a warning.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Stop history text mismatch🐞
Description
On finish_reason=='stop' with no streamed text, the response output now contains an empty string,
but conversation history still stores "(No update)", changing subsequent turns when clients use
previous_response_id.
Code

src/open_responses_server/responses_service.py[R922-954]

+                            final_text = output_text_content or ""
+
+                            # Emit text closing events: output_text.done, content_part.done, output_item.done
+                            if final_text:
+                                # output_text.done
+                                yield f"data: {json.dumps({'type': 'response.output_text.done', 'item_id': message_id, 'output_index': 0, 'content_index': 0, 'text': final_text})}\n\n"
+                                # content_part.done
+                                yield f"data: {json.dumps({'type': 'response.content_part.done', 'item_id': message_id, 'output_index': 0, 'content_index': 0, 'part': {'type': 'output_text', 'text': final_text, 'annotations': []}})}\n\n"
+
+                            # Build the final message item
+                            final_msg_item = {
                           "id": message_id,
                           "type": "message",
                           "role": "assistant",
-                                "content": [{"type": "output_text", "text": output_text_content or "(No update)"}]
-                            }]
+                                "status": "completed",
+                                "content": [{"type": "output_text", "text": final_text, "annotations": []}]
+                            }
+
+                            # output_item.done
+                            yield f"data: {json.dumps({'type': 'response.output_item.done', 'output_index': 0, 'item': final_msg_item})}\n\n"
+
+                            logger.info(f"Response completed with text: {final_text[:100]}...")
+
+                            response_obj.status = "completed"
+                            response_obj.output = [final_msg_item]
                       completed_event = ResponseCompleted(
                           type="response.completed",
                           response=response_obj
                       )
-                            
+
                       # Save conversation history if we have chat_request available
                       if chat_request:
                           # Get the existing messages from the request
Evidence
The stop handler builds the final response item using final_text = output_text_content or '', but
the history-saving path still appends output_text_content or '(No update)';
convert_responses_to_chat_completions later loads conversation_history for previous_response_id and
forwards it as messages, so this placeholder becomes part of the next model prompt even though the
client saw an empty assistant message.

src/open_responses_server/responses_service.py[918-966]
src/open_responses_server/responses_service.py[101-107]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When a stop completion has no content, the response output now correctly uses `""`, but conversation history still stores `"(No update)"`. This diverges client-visible output from server-side history and can pollute subsequent turns when using `previous_response_id`.
### Issue Context
Stop handler:
- Uses `final_text = output_text_content or ""` for the completed response item.
- Uses `output_text_content or "(No update)"` for conversation history.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[918-966]
- src/open_responses_server/responses_service.py[101-107]
### Suggested changes
- In the stop handler history append, store `final_text` (or `output_text_content or ""`) instead of `"(No update)"`.
- Optionally add/adjust a test asserting history content matches the emitted completed output text for the empty-stop case.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Done without added event🐞
Description
The stop handler always emits response.output_item.done even when no text was streamed, so
response.output_item.added/content_part.added were never emitted, producing an invalid event
sequence for empty outputs.
Code

src/open_responses_server/responses_service.py[R918-946]

                   # If the finish reason is "stop", emit the completed event
                   if choice["finish_reason"] == "stop":
                       logger.info("Received stop finish reason")
-                            # If we have any text content, add it to the output
-                            if not response_obj.output:
-                                response_obj.output.append({
-                                    "id": message_id,
-                                    "type": "message",
-                                    "role": "assistant",
-                                    "content": [{"type": "output_text", "text": f"{output_text_content}\n\n" or "Done"}]
-                                })
-                            
-                            # Log complete output text
-                            logger.info(f"Response completed with text: {output_text_content[:100]}...\n\n")
-                                
-                            response_obj.status = "completed"
-                            response_obj.output= [{
+
+                            final_text = output_text_content or ""
+
+                            # Emit text closing events: output_text.done, content_part.done, output_item.done
+                            if final_text:
+                                # output_text.done
+                                yield f"data: {json.dumps({'type': 'response.output_text.done', 'item_id': message_id, 'output_index': 0, 'content_index': 0, 'text': final_text})}\n\n"
+                                # content_part.done
+                                yield f"data: {json.dumps({'type': 'response.content_part.done', 'item_id': message_id, 'output_index': 0, 'content_index': 0, 'part': {'type': 'output_text', 'text': final_text, 'annotations': []}})}\n\n"
+
+                            # Build the final message item
+                            final_msg_item = {
                           "id": message_id,
                           "type": "message",
                           "role": "assistant",
-                                "content": [{"type": "output_text", "text": output_text_content or "(No update)"}]
-                            }]
+                                "status": "completed",
+                                "content": [{"type": "output_text", "text": final_text, "annotations": []}]
+                            }
+
+                            # output_item.done
+                            yield f"data: {json.dumps({'type': 'response.output_item.done', 'output_index': 0, 'item': final_msg_item})}\n\n"
+
+                            logger.info(f"Response completed with text: {final_text[:100]}...")
+
+                            response_obj.status = "completed"
+                            response_obj.output = [final_msg_item]
Evidence
Message output_item.added/content_part.added are only emitted on the first content delta; if a stop
arrives with no content deltas, the stop path still yields response.output_item.done
unconditionally, meaning clients can receive a done event for an item they never saw added.

src/open_responses_server/responses_service.py[595-616]
src/open_responses_server/responses_service.py[918-946]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
For empty stop responses (no `delta.content`), the stream emits `response.output_item.done` but never emitted `response.output_item.added` / `response.content_part.added`. This breaks expected lifecycle ordering.
### Issue Context
- `output_item.added` for message output is only emitted when the first content chunk arrives.
- Stop handler emits `output_item.done` regardless of whether any message output item was added.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[595-616]
- src/open_responses_server/responses_service.py[918-946]
### Suggested changes
Choose one:
1) If `final_text == ""`, emit `response.output_item.added` (and optionally `response.content_part.added`) before `response.output_item.done`, OR
2) If `final_text == ""` and no message item was ever added, skip emitting `response.output_item.done` (and keep only `response.completed`).
Add a test asserting the lifecycle ordering for the empty-stop case.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
4. Duplicate tool output_index🐞
Description
Tool call initialization assigns output_index=tool_call_counter but only increments
tool_call_counter when the first delta contains a function name; if name arrives later, multiple
tool calls can share the same output_index and output_item events can reference non-existent items.
Code

src/open_responses_server/responses_service.py[R536-579]

                           # Initialize tool call if not exists
                           if index not in tool_calls:
+                                    tool_call_id = tool_delta.get("id", f"call_{uuid.uuid4().hex}")
                               tool_calls[index] = {
-                                        "id": tool_delta.get("id", f"call_{uuid.uuid4().hex}"),
+                                        "id": tool_call_id,
                                   "type": tool_delta.get("type", "function"),
                                   "function": {
                                       "name": tool_delta.get("function", {}).get("name", ""),
-                                            "arguments": tool_delta.get("function", {}).get("arguments", ""),
+                                            "arguments": "",
                                   },
-                                        "item_id": f"tool_call_{uuid.uuid4().hex}",
                                   "output_index": tool_call_counter
                               }
-                                    
-                                    # If we got a tool name, emit the created event
+
+                                    # If we got a tool name, emit the output_item.added event
                               if "function" in tool_delta and "name" in tool_delta["function"]:
                                   tool_call = tool_calls[index]
                                   tool_call["function"]["name"] = tool_delta["function"]["name"]
-                                        # Log tool call creation
                                   logger.info(f"Tool call created: {tool_call['function']['name']}")
-                                        
-                                        # Check if this is an MCP tool or a user-defined tool
+
                                   is_mcp = mcp_manager.is_mcp_tool(tool_call["function"]["name"])
-                                        tool_status = "in_progress" if is_mcp else "ready"
-                                        
-                                        logger.info(f"[TOOL-CALL-CREATED] Tool '{tool_call['function']['name']}': is_mcp={is_mcp}, status={tool_status}")
-                                        
-                                        # Add the tool call to the response output in Responses API format
-                                        response_obj.output.append({
-                                            "arguments": tool_call["function"]["arguments"],
+                                        logger.info(f"[TOOL-CALL-CREATED] Tool '{tool_call['function']['name']}': is_mcp={is_mcp}, status=in_progress")
+
+                                        # Build the function_call item for the response output
+                                        fc_item = {
+                                            "arguments": "",
                                       "call_id": tool_call["id"],
                                       "name": tool_call["function"]["name"],
                                       "type": "function_call",
                                       "id": tool_call["id"],
-                                            "status": tool_status
-                                        })
-                                        
-                                        # Emit the in_progress event
-                                        in_progress_event = ResponseInProgress(
-                                            type="response.in_progress",
-                                            response=response_obj
+                                            "status": "in_progress"
+                                        }
+                                        response_obj.output.append(fc_item)
+
+                                        # Emit response.output_item.added
+                                        item_added_event = OutputItemAdded(
+                                            output_index=tool_call["output_index"],
+                                            item=fc_item
                                   )
-                                        
-                                        logger.info(f"Emitting {in_progress_event}")
-                                        yield f"data: {json.dumps(in_progress_event.dict())}\n\n"
+                                        logger.info(f"Emitting output_item.added for '{tool_call['function']['name']}'")
+                                        yield f"data: {json.dumps(item_added_event.dict())}\n\n"
                                   tool_call_counter += 1
                           
Evidence
The code sets output_index at tool-call creation using the current counter, but increments the
counter only inside the block that emits output_item.added (which is gated on the name being present
in that first delta). Later, finish_reason=='tool_calls' emits arguments.done and output_item.done
for every tool_call regardless of whether an output item was ever added, so the emitted
indices/items can collide or be out of sync with response_obj.output.

src/open_responses_server/responses_service.py[533-579]
src/open_responses_server/responses_service.py[758-801]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`tool_call_counter` is only incremented when an `output_item.added` is emitted (requires tool name present). If a model streams tool deltas without the name in the first fragment, multiple tool calls may get the same `output_index`, and later events (`arguments.done`, `output_item.done`) can refer to an item that was never added.
### Issue Context
This can happen if tool call fields arrive over multiple deltas (id/type first, name later).
### Fix Focus Areas
- src/open_responses_server/responses_service.py[533-579]
- src/open_responses_server/responses_service.py[758-801]
### Suggested changes
- Increment `tool_call_counter` immediately when creating a new `tool_calls[index]` entry, not conditionally.
- Emit an `output_item.added` as soon as the tool call is created (even if name is empty), then patch/update the stored item when name arrives.
- Alternatively, track a boolean like `tool_calls[index]['added_emitted']` and when name becomes available later, emit the missing `output_item.added` (without reusing an already-assigned `output_index`).
- Add a test where the first tool_calls delta lacks `function.name` and ensure unique output_index values and correct event ordering.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

5. Reasoning cache eviction incorrect🐞
Description
reasoning_content_cache is trimmed by sorting random call_id strings (not by recency) and is only
size-bounded in the tool_calls finish path, so it can grow unbounded and eviction does not match the
intended "keep last 200" behavior.
Code

src/open_responses_server/responses_service.py[R842-853]

+                            # Cache reasoning_content keyed by call_ids for CoT passback
+                            # When Codex CLI sends these call_ids back, we inject the reasoning
+                            if reasoning_content:
+                                for tc in tool_calls.values():
+                                    reasoning_content_cache[tc["id"]] = reasoning_content
+                                logger.info(f"[COT-PASSBACK] Cached reasoning ({len(reasoning_content)} chars) for {len(tool_calls)} call_ids")
+                                # Trim cache if too large (keep last 200 entries)
+                                if len(reasoning_content_cache) > 200:
+                                    excess_keys = sorted(reasoning_content_cache.keys())[:len(reasoning_content_cache) - 200]
+                                    for k in excess_keys:
+                                        del reasoning_content_cache[k]
+
Evidence
The cache trimming logic deletes sorted(reasoning_content_cache.keys())[:excess], but call_id keys
are not ordered by insertion time, so this does not remove the oldest entries. Additionally, caching
also occurs in the finish_reason=='function_call' path without any trimming, so repeated requests
can grow the cache indefinitely.

src/open_responses_server/responses_service.py[17-21]
src/open_responses_server/responses_service.py[695-699]
src/open_responses_server/responses_service.py[842-853]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`reasoning_content_cache` eviction is not based on recency and is not applied in all insertion paths, so memory can grow unbounded and retained entries are effectively arbitrary.
### Issue Context
- Current eviction: sort keys (random strings) and delete first N.
- Trimming happens only in the `finish_reason == 'tool_calls'` path; other caching paths don’t trim.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[17-21]
- src/open_responses_server/responses_service.py[695-699]
- src/open_responses_server/responses_service.py[842-853]
### Suggested changes
- Replace the dict with an insertion-ordered structure (`collections.OrderedDict`) or track insertion timestamps.
- Enforce the size cap on every insertion (both `function_call` and `tool_calls` paths).
- Implement true LRU/FIFO eviction (e.g., `popitem(last=False)` for FIFO, or move-to-end on read if LRU).
- Consider namespacing by response/session if cross-request mixing is undesired.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment thread src/open_responses_server/responses_service.py
Comment thread src/open_responses_server/responses_service.py
Comment thread src/open_responses_server/responses_service.py
Comment thread src/open_responses_server/responses_service.py
@ivanopcode
Copy link
Copy Markdown
Author

Heads up: this PR and #53 are functionally different, but both modify responses_service.py around tool-call handling. If both changes are merged, that integration will need careful manual conflict resolution to preserve both the Responses lifecycle changes here and the skill-tool routing from #53.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the Responses adapter to better match the Responses API, especially for tool-calling clients and models that emit separate reasoning_content.

Changes:

  • Expanded input item conversion to support function_call, function_call_output (incl. id fallback + non-string output normalization), and developer messages.
  • Implemented a richer streaming event lifecycle for tool calls and message items (added/done events, arguments.done, text done events) and standardized status values (in_progress / completed).
  • Added a bounded reasoning_content cache to reinject model reasoning across tool-call turns (“CoT passback”).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tests/test_responses_service.py Adds regression tests for reasoning cache eviction, tool output normalization, and stricter streaming lifecycle expectations.
src/open_responses_server/responses_service.py Updates request input handling, adds CoT caching, and significantly expands streaming event lifecycle emissions.
src/open_responses_server/models/responses_models.py Updates/extends streaming event models (e.g., output_item.*, output_text.done) and renames ToolCallArgumentsDone.iditem_id.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 592 to +650
if index not in tool_calls:
tool_call_id = tool_delta.get("id", f"call_{uuid.uuid4().hex}")
tool_calls[index] = {
"id": tool_delta.get("id", f"call_{uuid.uuid4().hex}"),
"id": tool_call_id,
"type": tool_delta.get("type", "function"),
"function": {
"name": tool_delta.get("function", {}).get("name", ""),
"arguments": tool_delta.get("function", {}).get("arguments", ""),
"arguments": "",
},
"item_id": f"tool_call_{uuid.uuid4().hex}",
"output_index": tool_call_counter
"output_index": tool_call_counter,
"added_emitted": False,
}

# If we got a tool name, emit the created event
if "function" in tool_delta and "name" in tool_delta["function"]:
tool_call = tool_calls[index]
tool_call["function"]["name"] = tool_delta["function"]["name"]
# Log tool call creation
logger.info(f"Tool call created: {tool_call['function']['name']}")

# Check if this is an MCP tool or a user-defined tool
is_mcp = mcp_manager.is_mcp_tool(tool_call["function"]["name"])
tool_status = "in_progress" if is_mcp else "ready"

logger.info(f"[TOOL-CALL-CREATED] Tool '{tool_call['function']['name']}': is_mcp={is_mcp}, status={tool_status}")

# Add the tool call to the response output in Responses API format
response_obj.output.append({
"arguments": tool_call["function"]["arguments"],
"call_id": tool_call["id"],
"name": tool_call["function"]["name"],
"type": "function_call",
"id": tool_call["id"],
"status": tool_status
})

# Emit the in_progress event
in_progress_event = ResponseInProgress(
type="response.in_progress",
response=response_obj
)

logger.info(f"Emitting {in_progress_event}")
yield f"data: {json.dumps(in_progress_event.dict())}\n\n"
tool_call_counter += 1

tool_call = tool_calls[index]

tool_call_counter += 1
if "function" in tool_delta and "name" in tool_delta["function"]:
tool_call["function"]["name"] = tool_delta["function"]["name"]
item_added_payload = ensure_tool_call_added(tool_call)
if item_added_payload:
yield item_added_payload

# Process function arguments if present
if "function" in tool_delta and "arguments" in tool_delta["function"]:
arg_fragment = tool_delta["function"]["arguments"]
tool_calls[index]["function"]["arguments"] += arg_fragment

# Emit delta event
args_event = ToolCallArgumentsDelta(
type="response.function_call_arguments.delta",
item_id=tool_calls[index]["item_id"],
item_id=tool_calls[index]["id"],
output_index=tool_calls[index]["output_index"],
delta=arg_fragment
)

yield f"data: {json.dumps(args_event.dict())}\n\n"

# Handle content (text)
elif "content" in delta and delta["content"] is not None:
content_delta = delta["content"]
output_text_content += content_delta

# Create a new message if it doesn't exist
if not response_obj.output:
response_obj.output.append({

# On first text chunk, emit output_item.added + content_part.added
if not response_obj.output or not any(
o.get("type") == "message" for o in response_obj.output
):
msg_item = {
"id": message_id,
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": output_text_content or "(No update)"}]
})

"status": "in_progress",
"content": []
}
response_obj.output.append(msg_item)
# output_item.added
yield f"data: {json.dumps({'type': 'response.output_item.added', 'output_index': 0, 'item': msg_item})}\n\n"
# content_part.added
yield f"data: {json.dumps({'type': 'response.content_part.added', 'item_id': message_id, 'output_index': 0, 'content_index': 0, 'part': {'type': 'output_text', 'text': '', 'annotations': []}})}\n\n"

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_index values can collide and/or not match the actual position in response_obj.output. Tool calls use tool_call_counter starting at 0, while message events hardcode output_index: 0; if text is emitted before/alongside tool calls (or vice versa), multiple items can share output_index=0 and tool calls may get an output_index that doesn't reflect their order in the output array. Consider maintaining a single monotonically increasing output_index for all output items (messages + tool calls) and storing it per item when you append to response_obj.output so events and response.completed.response.output stay consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +21
# Cache reasoning_content (CoT) keyed by tool call_id for passback.
# Keep a bounded insertion-ordered cache so recent tool-call chains can feed
# reasoning back into the next request without unbounded growth.
reasoning_content_cache: OrderedDict[str, str] = OrderedDict()
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasoning_content_cache is process-global and keyed only by client-provided call_id. If call_id is reused/collides across requests (or across different users in a shared deployment), reasoning content can be injected into an unrelated conversation, which is a potential cross-request data leak. Consider scoping by conversation/response identifier (e.g., (previous_response_id, call_id)) or storing reasoning alongside the per-response conversation_history instead of in a global cache.

Copilot uses AI. Check for mistakes.
Comment on lines 478 to 500
# If we haven't already completed the response, do it now
if response_obj.status != "completed":
# If no output, add empty message
if not response_obj.output:
response_obj.output.append({
"id": message_id,
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": f"{output_text_content}\n\n" or "Done"}]
})

final_text = output_text_content or ""

# Emit text closing events if we had text content
if final_text:
yield f"data: {json.dumps({'type': 'response.output_text.done', 'item_id': message_id, 'output_index': 0, 'content_index': 0, 'text': final_text})}\n\n"
yield f"data: {json.dumps({'type': 'response.content_part.done', 'item_id': message_id, 'output_index': 0, 'content_index': 0, 'part': {'type': 'output_text', 'text': final_text, 'annotations': []}})}\n\n"

final_msg_item = {
"id": message_id,
"type": "message",
"role": "assistant",
"status": "completed",
"content": [{"type": "output_text", "text": final_text, "annotations": []}]
}

# Emit output_item.done if we have text
if final_text:
yield f"data: {json.dumps({'type': 'response.output_item.done', 'output_index': 0, 'item': final_msg_item})}\n\n"

response_obj.output = [final_msg_item] if final_text else response_obj.output
response_obj.status = "completed"
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the [DONE] handler, if final_text is empty you leave response_obj.output as-is (potentially empty) and skip emitting response.output_item.* lifecycle events. This can produce a response.completed event with no message output, which is inconsistent with the finish_reason == "stop" path (and with the goal of always having a valid message lifecycle even for empty text). Consider always creating an empty message item (and emitting output_item.added/done) when completing on [DONE] without any prior output.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants