LCORE-1616 /v1/responses support for MCP tool merging (goose agent support) by dprince · Pull Request #1435 · lightspeed-core/lightspeed-stack

dprince · 2026-03-30T18:46:50Z

Description

Add server-tool merging and filtering for Responses API

Introduce X-LCS-Merge-Server-Tools header that allows client-provided
tools to be merged with server-configured tools (RAG, MCP) instead of
replacing them. Tool conflicts (duplicate MCP server_label or duplicate
file_search) are rejected with a 409 error.

Add is_server_deployed_output() to distinguish LCS-deployed tools from
client tools so only server tools are included in turn summaries,
metrics, and storage. Client tool output items are still returned in
the response.

Filter server-deployed MCP streaming events (mcp_call, mcp_list_tools,
mcp_approval_request) from the SSE stream so OpenAI-compatible clients
like Goose don't fail on unknown item types. Strip these items from the
response.completed output array as well.

Includes doc/diagram of goose (local agent) use case we are prototyping.

Type of change

Tools used to create PR

Assisted-by: Claude

Checklist before requesting a review

[x ] I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Added unit tests
Testing locally with working lightspeed-stack installation using /v1/responses API with Goose. Was able to verify working MCP tools at both the local agent (goose) level, and via the lightspeed-stack (server) level. If no extra header was configured in goose only the client tools were active (previous behavior)

Summary by CodeRabbit

New Features
- Optional merging of server-configured and client-provided tools via request header; conflicts return HTTP 409.
- Streaming responses filter and hide server-deployed tool outputs from client event streams; final stored turn summaries include only server-deployed tool outputs while client tool outputs are returned to callers.
Documentation
- New guide describing the hybrid tool execution model, merge header, conflict semantics, and streaming/filtering behavior.
Tests
- Expanded coverage for merging, conflict handling, filtering, and server/client output classification.

coderabbitai · 2026-03-30T18:47:12Z

Walkthrough

Adds a header-driven hybrid tool merging model for /v1/responses, conflict detection returning HTTP 409, SSE filtering to hide server-deployed MCP events from clients, extraction of response finalization into a helper, and updates to tool resolution and turn-summary logic to distinguish server-deployed vs client-provided outputs.

Changes

Cohort / File(s)	Summary
Documentation `docs/goose-example-tool-merging.md`	New doc describing header-driven merging (`X-LCS-Merge-Server-Tools: true`), conflict rules (MCP `server_label`, `file_search`), SSE filtering of server-deployed MCP events, and example flows.
Tool resolution & summaries `src/utils/responses.py`	Added `is_server_deployed_output()` and internal `_merge_tools()`; `resolve_tool_choice()` now detects merge header and can merge server tools or raise HTTP 409 on conflicts; `build_turn_summary()` skips client-provided outputs for stored summaries.
Response streaming & finalization `src/app/endpoints/responses.py`	SSE serialization uses `model_dump(..., by_alias=True)`; added server-MCP detection and chunk-filtering logic to suppress server-deployed MCP substream events and strip server MCP items from streamed `response.output`; extracted finalization into `_finalize_response()`.
Integration tests `tests/integration/endpoints/test_query_integration.py`	Updated mocked MCP `server_label` and assertions to reflect server-deployed tool outputs being treated as server-side (fewer tool_calls included in stored response).
Unit tests `tests/unit/app/endpoints/test_responses.py`, `tests/unit/utils/test_responses.py`	Added unit tests for `_is_server_mcp_output_item`, `_should_filter_mcp_chunk`, `is_server_deployed_output`, `_merge_tools` (conflict 409 cases), and `resolve_tool_choice` merge behavior.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant Server as LCS Server
    participant ServerMCP as Server MCP
    participant ClientMCP as Client MCP

    Client->>Server: POST /v1/responses\ntools=[client_tools]\nX-LCS-Merge-Server-Tools: true
    Server->>ServerMCP: prepare_tools() (RAG + configured MCP)
    ServerMCP-->>Server: server_tools
    Note over Server: _merge_tools() checks conflicts:\n• MCP server_label collisions\n• file_search overlap
    alt Conflict Detected
        Server-->>Client: HTTP 409 Conflict
    else No Conflict
        Server->>Server: merged_tools = client_tools + server_tools
        Server-->>Client: Begin SSE streaming (merged tools available)
    end

sequenceDiagram
    participant Client
    participant Server as LCS Server
    participant MCP as Server-Deployed MCP
    participant Storage as Turn Summary & Storage

    Client->>Server: Request using merged tools
    Server->>MCP: Execute server-deployed tool calls
    MCP-->>Server: mcp_call / results
    Server->>Server: response_generator() streaming SSE
    Note over Server: Filter & hide server-deployed MCP item types\n(mcp_call, mcp_list_tools, mcp_approval_request)
    Server->>Client: SSE stream with server MCP events suppressed
    Server->>Server: _finalize_response()
    Note over Server: build_turn_summary() includes only server-deployed outputs
    Server->>Storage: Store summary/metrics with server tool outputs
    Server-->>Client: Complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'LCORE-1616 /v1/responses support for MCP tool merging (goose agent support)' directly and clearly describes the primary change: adding MCP tool merging support to the /v1/responses API endpoint with a focus on Goose agent compatibility.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/goose-example-tool-merging.md`:
- Around line 95-142: The fenced code block containing the conversation flow
diagram (the block starting with "Goose                     LCS /v1/responses   
Llama Stack        MCP/RAG" and the opening triple backticks) is missing a
language specifier; update the opening fence from just ``` to ```text (or
```plaintext) so the diagram is treated as plain text in rendering and
syntax-highlighting. Locate the triple-backtick that begins the diagram and add
the specifier, leaving the rest of the block unchanged.
- Around line 14-50: The fenced ASCII diagram block in
docs/goose-example-tool-merging.md is missing a language specifier which linters
flag; update the opening fence for the diagram (the triple-backtick that
currently starts the block) to include a language token such as text or
plaintext (e.g., change ``` to ```text) so the ASCII diagram is annotated for
static analysis and rendering; ensure only the opening fence is changed and the
rest of the Goose Agent / OpenShift Cluster diagram content (the ASCII art)
remains untouched.

In `@src/utils/responses.py`:
- Around line 1126-1158: The is_server_deployed_output function currently treats
an MCPApprovalResponse (item_type "mcp_approval_response") as server-side
because it lacks a server_label; update is_server_deployed_output to explicitly
handle "mcp_approval_response" (e.g., return False to mark it as
client-provided) or implement lookup/propagation of server_label, and add a unit
test in TestIsServerDeployedOutput to cover the approval response case;
reference is_server_deployed_output and TestIsServerDeployedOutput when making
the change.

In `@tests/unit/utils/test_responses.py`:
- Around line 2773-2804: The test test_client_tools_with_merge_header only
asserts behavior when the header key uses uppercase "X-LCS-Merge-Server-Tools";
update the test (and the other affected tests around the same blocks) to also
call resolve_tool_choice with a lowercase header key "x-lcs-merge-server-tools"
and assert the same merged-tools behavior (same tools, tool_choice, etc.) so
header normalization won't cause regressions; locate the calls to
resolve_tool_choice and duplicate or parametrize them to include both header key
variants to validate lowercase handling.
- Around line 2626-2635: Add a complementary positive test that verifies
mcp_approval_request is classified server-side when the server label matches:
create a new test (e.g., test_mcp_approval_request_server_side) that mocks
configuration (mocker.patch("utils.responses.configuration")) with
mock_config.mcp_servers containing the target label (e.g., "client-mcp"), create
an item with item.type = "mcp_approval_request" and item.server_label =
"client-mcp", and assert is_server_deployed_output(item) is True; reference the
existing test_mcp_approval_request_client_side, the configuration mock, and the
is_server_deployed_output function to locate where to add this test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f5204ab3-7d8c-490d-8111-8a64f71f83a1

📥 Commits

Reviewing files that changed from the base of the PR and between 1b68d36 and 20809d0.

📒 Files selected for processing (6)

docs/goose-example-tool-merging.md
src/app/endpoints/responses.py
src/utils/responses.py
tests/integration/endpoints/test_query_integration.py
tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: library mode / ci
GitHub Check: E2E: server mode / ci

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

tests/integration/endpoints/test_query_integration.py
tests/unit/app/endpoints/test_responses.py
src/app/endpoints/responses.py
src/utils/responses.py
tests/unit/utils/test_responses.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

tests/integration/endpoints/test_query_integration.py
tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use FastAPI imports from the fastapi module: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack client imports
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntax str | int instead of Union[str, int] for type annotations
Use Optional[Type] for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for functions performing I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with @abstractmethod decorators for interface implementations
Use complete type annotations for all class attributes; avoid using Any type
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed

Files:

src/app/endpoints/responses.py
src/utils/responses.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/responses.py

🧠 Learnings (2)

📚 Learning: 2026-02-25T07:46:39.608Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

src/utils/responses.py

📚 Learning: 2026-02-23T14:56:59.186Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

src/utils/responses.py

🪛 markdownlint-cli2 (0.22.0)

docs/goose-example-tool-merging.md

[warning] 14-14: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (13)

src/utils/responses.py (3)

1193-1195: LGTM! Server-deployed output filtering in turn summary.

The filter correctly excludes client-provided tool outputs from the turn summary, ensuring only server-deployed tool calls and results are stored in metrics and conversation history.

1363-1411: LGTM! Tool merge logic with conflict detection.

The implementation correctly:

Detects MCP conflicts by comparing server_label values

Detects file_search conflicts when both client and server provide one

Returns merged tools with client tools first

Uses appropriate HTTP 409 status code for conflicts

Minor observation: Line 1411 wraps inputs in list() unnecessarily since client_tools and server_tools are already typed as list[InputTool], but this has no functional impact.

1449-1476: LGTM! Header-driven merge behavior.

The merge logic correctly:

Checks for X-LCS-Merge-Server-Tools: true header (case-insensitive)

Only merges when the header is explicitly set

Preserves original behavior when header is absent

Reuses the same vector_store_ids resolved from client tools when preparing server tools

tests/integration/endpoints/test_query_integration.py (2)

731-736: LGTM! Test expectations updated for server-deployed tool filtering.

The assertions correctly reflect the new behavior where:

Only server-deployed tool calls are included in response.tool_calls

function_call items (client-provided) are excluded

The expected count is reduced from 2 to 1

The comment clearly explains the reasoning.

635-635: No issues found: server1 is a valid MCP server name configured in test fixtures.

Verification confirms the server_label change to "server1" aligns with the MCP server configuration in the test fixtures (tests/integration/test_configuration.py lines 85-93 define server1 as a valid MCP server). The test setup correctly pairs this label with the mcp_list_tools item, and the assertions properly validate server-deployed tool classification.

tests/unit/app/endpoints/test_responses.py (1)

1378-1427: LGTM! Comprehensive tests for _is_server_mcp_output_item.

The test class provides good coverage including:

MCP types with matching and non-matching server labels

All three MCP event types (mcp_call, mcp_list_tools, mcp_approval_request)

Missing server_label handling

Non-MCP types returning False

Empty configured labels edge case

src/app/endpoints/responses.py (5)

405-407: LGTM! Added by_alias=True for SSE serialization.

Using by_alias=True ensures field names are serialized according to their aliases (e.g., created_at vs createdAt), maintaining consistency with the OpenAI Responses specification for SSE events.

Also applies to: 423-425, 435-435, 451-453

465-480: LGTM! Helper for filtering serialized MCP output items.

The function correctly identifies server-deployed MCP items from the serialized response.output array by checking:

Item type is one of the MCP types

server_label matches a configured MCP server

This complements is_server_deployed_output from utils/responses.py which operates on typed objects rather than dicts.

483-527: LGTM! Streaming chunk filter with output index tracking.

The implementation correctly:

Tracks output_index values when server MCP items are added

Filters intermediate events (response.mcp_call.*, response.mcp_list_tools.*) by tracked indices

Cleans up tracking when output_item.done is received using discard() (safe for missing keys)

Mutates server_mcp_output_indices as a side effect, which is documented via the function's purpose

530-575: LGTM! Extracted finalization logic into helper.

The _finalize_response helper:

Consolidates post-stream metadata extraction

Uses is_server_deployed_output to filter tool summaries (consistent with non-streaming path)

Handles conversation persistence with append_turn_items_to_conversation

This improves readability of response_generator by separating streaming from finalization concerns.

636-644: LGTM! Filtering server MCP items from response output.

The inline list comprehension correctly removes server-deployed MCP items from the output array in the serialized response, ensuring clients only receive item types they understand (message, function_call, etc.).

docs/goose-example-tool-merging.md (1)

1-92: LGTM! Clear documentation for hybrid tool architecture.

The documentation provides excellent coverage of:

The separation of knowledge (server) vs action (client) tools

Header-driven merge behavior with clear behavior table

Conflict detection semantics

SSE streaming filter mechanics with step-by-step explanation

Turn summary behavior for server vs client outputs

The architecture and flow diagrams effectively illustrate the concepts.

tests/unit/utils/test_responses.py (1)

2576-2738: Good coverage on classification and conflict behavior.

The new TestIsServerDeployedOutput and TestMergeTools cases are well-targeted and align with the merge/conflict contract (including 409 behavior).

coderabbitai · 2026-03-30T18:53:27Z

docs/goose-example-tool-merging.md

+```
+OpenShift Cluster
+┌────────────────────────────────────────────────────────────────────────────┐
+│                                                                            │
+│  namespace: foo-staging                                                    │
+│  ┌─────────────────────────────────────┐                                   │
+│  │       Goose Agent (Pod)             │                                   │
+│  │                                     │                                   │
+│  │  ServiceAccount: goose-readonly     │                                   │
+│  │  (read-only, limited k8s/OCP access)│                                   │
+│  │                                     │    HTTPS POST /v1/responses       │
+│  │  LLM reasoning loop                 │                                   │
+│  │  ├─ LCS /v1/responses ──────────────┼─────────────────────┐             │
+│  │  │  (knowledge queries)             │                     │             │
+│  │  │                                  │                     ▼             │
+│  │  ├─ Local MCP tools                 │  namespace: lightspeed-stack      │
+│  │  │  (cluster read-only operations)  │  ┌─────────────────────────┐      │
+│  │  │  ├─ oc get pods                  │  │  Lightspeed Stack (LCS) │      │
+│  │  │  ├─ oc logs                      │  │                         │      │
+│  │  │  └─ oc describe                  │  │  Llama Stack Engine     │      │
+│  │  └──────────────────────────────────┘  │  ├─ file_search (RAG)   │      │
+│  └──────────┬───────────────────────────  │  ├─ mcp: OKP/Solr       │      │
+│             │ ServiceAccount token        │  ├─ mcp: errata         │      │
+│             │ (read-only RBAC)            │  └─ mcp: bugzilla       │      │
+│             │                             └───────────┬─────────────┘      │
+│             │                                   ┌─────┴──────┐             │
+│             │                                   │FAISS / Solr│             │
+│             │                                   └────────────┘             │
+│             │                                                              │
+│             │  namespace: foo-staging                                      │
+│             │  ┌───────────────────────────┐                               │
+│             └─▶│  pods, deployments, ...   │                               │
+│                │  (read-only RBAC enforced)│                               │
+│                └───────────────────────────┘                               │
+│                                                                            │
+└────────────────────────────────────────────────────────────────────────────┘
+```


🧹 Nitpick | 🔵 Trivial

Add language specifier to fenced code block.

Static analysis flagged this code block as missing a language specifier. For ASCII diagrams, use text or plaintext to satisfy linters and improve rendering hints.

-``` +```text OpenShift Cluster

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 14-14: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/goose-example-tool-merging.md` around lines 14 - 50, The fenced ASCII diagram block in docs/goose-example-tool-merging.md is missing a language specifier which linters flag; update the opening fence for the diagram (the triple-backtick that currently starts the block) to include a language token such as text or plaintext (e.g., change ``` to ```text) so the ASCII diagram is annotated for static analysis and rendering; ensure only the opening fence is changed and the rest of the Goose Agent / OpenShift Cluster diagram content (the ASCII art) remains untouched.

coderabbitai · 2026-03-30T18:53:27Z

docs/goose-example-tool-merging.md

+```
+Goose                     LCS /v1/responses        Llama Stack        MCP/RAG
+  │                             │                       │                │
+  │  "pod api-7f8b9 is crash-   │                       │                │
+  │   looping in foo-staging,   │                       │                │
+  │   help me diagnose it"      │                       │                │
+  │                             │                       │                │
+  │  POST {input, tools,        │                       │                │
+  │   X-LCS-Merge-Server-Tools} │                       │                │
+  │────────────────────────────▶│                       │                │
+  │                             │ merge client +        │                │
+  │                             │ server tools          │                │
+  │                             │                       │                │
+  │                             │ responses.create()    │                │
+  │                             │──────────────────────▶│                │
+  │                             │                       │ LLM: search RAG│
+  │                             │                       │───────────────▶│
+  │                             │                       │◀── RAG chunks: │
+  │                             │                       │  known OOM fix │
+  │                             │                       │  for api image │
+  │                             │                       │                │
+  │                             │                       │ LLM: query OKP │
+  │                             │                       │───────────────▶│
+  │                             │                       │◀── KB article: │
+  │                             │                       │  memory limits │
+  │                             │                       │  best practice │
+  │                             │                       │                │
+  │                             │◀──────────────────────│ answer with    │
+  │                             │                       │ RAG + KB ctx   │
+  │  SSE: response.completed    │                       │                │
+  │  (server MCP items filtered)│                       │                │
+  │◀────────────────────────────│                       │                │
+  │                             │                       │                │
+  │  Goose combines LCS knowledge with live cluster data:                │
+  │                             │                       │                │
+  │──▶ LOCAL MCP: oc get pods -n foo-staging                             │
+  │◀── pod/api-7f8b9  CrashLoopBackOff                                   │
+  │──▶ LOCAL MCP: oc logs pod/api-7f8b9                                  │
+  │◀── OOMKilled                                                         │
+  │──▶ LOCAL MCP: oc describe pod/api-7f8b9                              │
+  │◀── events, resource limits                                           │
+  │                                                                      │
+  │  Goose correlates live cluster state with RAG/OKP                    │
+  │  knowledge to produce a diagnosis:                                   │
+  │  "Pod is OOMKilled. RAG docs confirm a known fix                     │
+  │   for this image -- raise memory limit to 512Mi.                     │
+  │   See KB article for memory tuning best practices."                  │
+```


🧹 Nitpick | 🔵 Trivial

Add language specifier to fenced code block.

Same as above, add text or plaintext language specifier for the conversation flow diagram.

-``` +```text Goose LCS /v1/responses Llama Stack MCP/RAG

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/goose-example-tool-merging.md` around lines 95 - 142, The fenced code block containing the conversation flow diagram (the block starting with "Goose LCS /v1/responses Llama Stack MCP/RAG" and the opening triple backticks) is missing a language specifier; update the opening fence from just ``` to ```text (or ```plaintext) so the diagram is treated as plain text in rendering and syntax-highlighting. Locate the triple-backtick that begins the diagram and add the specifier, leaving the rest of the block unchanged.

coderabbitai · 2026-03-30T18:53:27Z

src/utils/responses.py

+def is_server_deployed_output(output_item: ResponseOutput) -> bool:
+    """Check if a response output item belongs to a tool deployed by LCS.
+
+    In the hybrid architecture clients may provide their own tools (function
+    tools or MCP servers running locally) alongside server-configured tools.
+    This function identifies items that belong to LCS-deployed tools so that
+    only those are included in server-side processing (turn summary, metrics,
+    storage).  Client tool output items are still returned in the response
+    to the caller but are not processed internally.
+
+    Args:
+        output_item: A ResponseOutput item from the response.
+
+    Returns:
+        True if the item should be processed by LCS, False for client tools.
+    """
+    item_type = getattr(output_item, "type", None)
+
+    # function_call items are always from client-provided tools;
+    # LCS only configures file_search and mcp tools.
+    if item_type == "function_call":
+        return False
+
+    # MCP items: check server_label against configured servers
+    if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
+        server_label = getattr(output_item, "server_label", None)
+        if server_label is not None:
+            configured_labels = {s.name for s in configuration.mcp_servers}
+            return server_label in configured_labels
+
+    # file_search_call, web_search_call, message, and unknown types
+    # are treated as server-side.
+    return True


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if mcp_approval_response has server_label attribute in the llama_stack_api types rg -n "mcp_approval_response" --type=py -C5

Repository: lightspeed-core/lightspeed-stack

Length of output: 7880

🏁 Script executed:

#!/bin/bash # Find MCPApprovalResponse definition and where approval responses are created rg -n "class MCPApprovalResponse|MCPApprovalResponse.*=" --type=py -A3 | head -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash # Search for where mcp_approval_response items originate/are constructed rg -n "MCPApprovalResponse\(" --type=py -B2 -A2

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash # Check if MCPApprovalResponse has server_label attribute rg -n "approval_request_id|server_label" src/utils/responses.py | grep -A2 -B2 "820\|821\|822"

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash # Find MCPApprovalResponse import and usage rg -n "MCPApprovalResponse" --type=py | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 732

🏁 Script executed:

#!/bin/bash # Look at the build_tool_call_summary function more closely to understand MCPApprovalResponse structure sed -n '815,835p' src/utils/responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 774

🏁 Script executed:

#!/bin/bash # Check all response output types handled in is_server_deployed_output context rg -n "type.*=.*\"mcp" --type=py | head -30

Repository: lightspeed-core/lightspeed-stack

Length of output: 2390

🏁 Script executed:

#!/bin/bash # Check imports at top of responses.py to understand MCPApprovalResponse head -50 src/utils/responses.py | grep -A5 -B5 "MCPApprovalResponse"

Repository: lightspeed-core/lightspeed-stack

Length of output: 439

🏁 Script executed:

#!/bin/bash # Search for any mentions of server_label in approval context rg -n "server_label" src/utils/responses.py | grep -B3 -A3 "approval"

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash # Check test for is_server_deployed_output to see if mcp_approval_response is tested rg -n "is_server_deployed_output" --type=py -B2 -A10

Repository: lightspeed-core/lightspeed-stack

Length of output: 12041

🏁 Script executed:

#!/bin/bash # Look for the full test class to see if there's a test for mcp_approval_response sed -n '2576,2661p' tests/unit/utils/test_responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 3657

🏁 Script executed:

#!/bin/bash # Check the MCPApprovalResponse type to see what attributes it has rg -n "class.*MCPApprovalResponse|@dataclass.*approval" --type=py -A15 | head -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash # Check if approval_request_id can be used to track origin of approval responses rg -n "approval_request_id" src/utils/responses.py -B3 -A3

Repository: lightspeed-core/lightspeed-stack

Length of output: 410

🏁 Script executed:

#!/bin/bash # Check if there's a way to track which approval_request originated from which server rg -n "approval_request_id" src/utils/ -A2 -B2 | head -80

Repository: lightspeed-core/lightspeed-stack

Length of output: 856

🏁 Script executed:

#!/bin/bash # Look for where mcp_approval_response items are created or generated to see if there's origin tracking rg -n "mcp_approval_response" src/ --type=py | grep -v test | grep -v ".pyc"

Repository: lightspeed-core/lightspeed-stack

Length of output: 373

🏁 Script executed:

Aboriginal # Check if there's any validation or origin tracking for approval requests/responses rg -n "MCPApprovalRequest|MCPApprovalResponse" src/ --type=py -A5 | grep -E "server_label|origin|source" | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 124

Add explicit handling for mcp_approval_response type or document design assumption.

The function checks mcp_approval_request but not mcp_approval_response. Since MCPApprovalResponse (imported from llama_stack_api.openai_responses) lacks a server_label attribute and only carries approval_request_id, this type falls through to the default return True, classifying all approval responses as server-deployed regardless of origin.

If client-provided MCP servers can emit approval responses, they would be incorrectly included in server-side processing (turn summary, storage) due to the missing origin check. Either:

Add server_label tracking to approval responses so they can be classified by origin, or

Document that approval responses are always treated as server-side by design (with rationale), or

Add test coverage for this scenario to TestIsServerDeployedOutput

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/responses.py` around lines 1126 - 1158, The is_server_deployed_output function currently treats an MCPApprovalResponse (item_type "mcp_approval_response") as server-side because it lacks a server_label; update is_server_deployed_output to explicitly handle "mcp_approval_response" (e.g., return False to mark it as client-provided) or implement lookup/propagation of server_label, and add a unit test in TestIsServerDeployedOutput to cover the approval response case; reference is_server_deployed_output and TestIsServerDeployedOutput when making the change.

coderabbitai · 2026-03-30T18:53:27Z

tests/unit/utils/test_responses.py

+    def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None:
+        """Test mcp_approval_request with unmatched label is client-side."""
+        mock_config = mocker.Mock()
+        mock_config.mcp_servers = []
+        mocker.patch("utils.responses.configuration", mock_config)
+
+        item = mocker.Mock()
+        item.type = "mcp_approval_request"
+        item.server_label = "client-mcp"
+        assert is_server_deployed_output(item) is False


🧹 Nitpick | 🔵 Trivial

Add a positive mcp_approval_request server-side classification test.

Line 2626 only validates the unmatched-label/client-side path. Please add the matching-label path too, since this type is part of the server-event filtering contract.

Proposed test addition

class TestIsServerDeployedOutput: @@ def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None: """Test mcp_approval_request with unmatched label is client-side.""" @@ assert is_server_deployed_output(item) is False + + def test_mcp_approval_request_server_side_when_label_matches( + self, mocker: MockerFixture + ) -> None: + """Test mcp_approval_request with matching label is server-deployed.""" + mock_config = mocker.Mock() + mock_server = mocker.Mock() + mock_server.name = "server-mcp" + mock_config.mcp_servers = [mock_server] + mocker.patch("utils.responses.configuration", mock_config) + + item = mocker.Mock() + item.type = "mcp_approval_request" + item.server_label = "server-mcp" + assert is_server_deployed_output(item) is True

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_responses.py` around lines 2626 - 2635, Add a complementary positive test that verifies mcp_approval_request is classified server-side when the server label matches: create a new test (e.g., test_mcp_approval_request_server_side) that mocks configuration (mocker.patch("utils.responses.configuration")) with mock_config.mcp_servers containing the target label (e.g., "client-mcp"), create an item with item.type = "mcp_approval_request" and item.server_label = "client-mcp", and assert is_server_deployed_output(item) is True; reference the existing test_mcp_approval_request_client_side, the configuration mock, and the is_server_deployed_output function to locate where to add this test.

coderabbitai · 2026-03-30T18:53:27Z

tests/unit/utils/test_responses.py

+    async def test_client_tools_with_merge_header(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test client tools merged with server tools when header is set."""
+        mock_client = mocker.AsyncMock()
+        mock_holder = mocker.Mock()
+        mock_holder.get_client.return_value = mock_client
+        mocker.patch(
+            "utils.responses.AsyncLlamaStackClientHolder",
+            return_value=mock_holder,
+        )
+        mock_config = mocker.Mock()
+        mock_config.configuration.byok_rag = []
+        mocker.patch("utils.responses.configuration", mock_config)
+
+        server_mcp = InputToolMCP(
+            server_label="server-tool", server_url="http://server"
+        )
+        mocker.patch(
+            "utils.responses.prepare_tools",
+            new=mocker.AsyncMock(return_value=[server_mcp]),
+        )
+
+        client_tool = InputToolMCP(
+            server_label="client-tool", server_url="http://client"
+        )
+        tools, tool_choice, _ = await resolve_tool_choice(
+            tools=[client_tool],
+            tool_choice=None,
+            token="tok",
+            request_headers={"X-LCS-Merge-Server-Tools": "true"},
+        )


⚠️ Potential issue | 🟡 Minor

Header-merge tests should also validate lowercase header keys.

Current assertions only use X-LCS-Merge-Server-Tools casing (Line 2803/2841/2899). Add a lowercase-key variant to prevent missing regressions when request headers are normalized before reaching resolve_tool_choice.

Proposed hardening for header-key coverage

class TestResolveToolChoice: @@ - async def test_client_tools_with_merge_header( + `@pytest.mark.parametrize`( + "merge_header_key", + ["X-LCS-Merge-Server-Tools", "x-lcs-merge-server-tools"], + ) + async def test_client_tools_with_merge_header( self, mocker: MockerFixture + , merge_header_key: str ) -> None: @@ tools, tool_choice, _ = await resolve_tool_choice( tools=[client_tool], tool_choice=None, token="tok", - request_headers={"X-LCS-Merge-Server-Tools": "true"}, + request_headers={merge_header_key: "true"}, )

Also applies to: 2836-2842, 2895-2900

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_responses.py` around lines 2773 - 2804, The test test_client_tools_with_merge_header only asserts behavior when the header key uses uppercase "X-LCS-Merge-Server-Tools"; update the test (and the other affected tests around the same blocks) to also call resolve_tool_choice with a lowercase header key "x-lcs-merge-server-tools" and assert the same merged-tools behavior (same tools, tool_choice, etc.) so header normalization won't cause regressions; locate the calls to resolve_tool_choice and duplicate or parametrize them to include both header key variants to validate lowercase handling.

tisnik · 2026-03-30T19:26:40Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review ToolChoice none ignored The `no_tools` flag (when `tool_choice` is none) is computed only in the branch where client tools are provided, but not passed to `prepare_tools` in the else branch. As a result, requests with no tools and `tool_choice: "none"` will still load server tools instead of disabling them. mcp_headers: Optional[McpHeaders] = None, request_headers: Optional[Mapping[str, str]] = None, ) -> tuple[Optional[list[InputTool]], Optional[ToolChoice], Optional[list[str]]]: """Resolve tools and tool_choice for the Responses API. When the request includes tools and the ``X-LCS-Merge-Server-Tools: true`` header is set, client-provided tools are merged with server-configured tools (RAG, MCP). Conflicts (e.g. a client MCP tool with the same server_label as a server-configured one, or duplicate file_search tools) are rejected with a 409 error. Without the header, client-provided tools are used as-is (original behavior). When no tools are provided in the request, all LCORE-configured tools are loaded via prepare_tools. The tool_choice "none" flag disables all tools. Args: tools: Tools from the request, or None to use LCORE-configured tools. tool_choice: Requested tool choice (e.g. auto, required, none) or None. token: User token for MCP/auth. mcp_headers: Optional MCP headers to propagate. request_headers: Optional request headers for tool resolution. Returns: A tuple of (prepared_tools, prepared_tool_choice, vector_store_ids): prepared_tools is the list of tools to use, or None if none configured; prepared_tool_choice is the resolved tool choice, or None when there are no tools; vector_store_ids is extracted from tools (in user-facing format) when provided, otherwise None. """ prepared_tools: Optional[list[InputTool]] = None client = AsyncLlamaStackClientHolder().get_client() no_tools = ( isinstance(tool_choice, ToolChoiceMode) and tool_choice == ToolChoiceMode.none ) merge_server_tools = ( request_headers is not None and request_headers.get("X-LCS-Merge-Server-Tools", "").lower() == "true" ) if tools: # explicitly specified in request # Per-request override of vector stores (user-facing rag_ids) vector_store_ids = extract_vector_store_ids_from_tools(tools) # Translate user-facing rag_ids to llama-stack vector_store_ids in each file_search tool byok_rags = configuration.configuration.byok_rag prepared_tools = translate_tools_vector_store_ids(tools, byok_rags) # Optionally merge server-configured tools (RAG, MCP) with client tools if merge_server_tools: server_tools = await prepare_tools( client=client, vector_store_ids=vector_store_ids, no_tools=no_tools, token=token, mcp_headers=mcp_headers, request_headers=request_headers, ) if server_tools: prepared_tools = _merge_tools(prepared_tools, server_tools) Incomplete MCP filtering The `_should_filter_mcp_chunk` helper drops events for `mcp_call` and `mcp_list_tools` substreams but does not filter `response.mcp_approval_request.*` events. This may leak server-side approval request chunks to clients that cannot handle them. def _should_filter_mcp_chunk( chunk: OpenAIResponseObjectStream, event_type: Optional[str], configured_mcp_labels: set[str], server_mcp_output_indices: set[int], ) -> bool: """Check if a streaming chunk is a server-deployed MCP event that should be filtered. Args: chunk: The streaming chunk to check. event_type: The event type of the chunk. configured_mcp_labels: Set of server_label names configured in LCS. server_mcp_output_indices: Tracked output indices of server-deployed MCP calls. Returns: True if the chunk should be filtered out from the client stream. """ if event_type == "response.output_item.added": item_added_chunk = cast(OutputItemAddedChunk, chunk) item = item_added_chunk.item item_type = getattr(item, "type", None) if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"): server_label = getattr(item, "server_label", None) if server_label in configured_mcp_labels: server_mcp_output_indices.add(item_added_chunk.output_index) return True if event_type and ( event_type.startswith("response.mcp_call.") or event_type.startswith("response.mcp_list_tools.") ): output_index = getattr(chunk, "output_index", None) if output_index in server_mcp_output_indices: return True if event_type == "response.output_item.done": item_done_chunk = cast(OutputItemDoneChunk, chunk) item = item_done_chunk.item item_type = getattr(item, "type", None) if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"): if item_done_chunk.output_index in server_mcp_output_indices: server_mcp_output_indices.discard(item_done_chunk.output_index) return True return False

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 1378-1427: Add a regression test ensuring approval-request
substream events are filtered from client streams: in TestIsServerMcpOutputItem
add a new test (e.g., test_mcp_approval_request_stream_filtering) that simulates
a streaming chunk key starting with "response.mcp_approval_request." and asserts
the streaming filter logic rejects/filters that chunk so approval-request events
cannot leak to clients; reference the mcp approval path by using the
"response.mcp_approval_request.*" key and the existing helper
_is_server_mcp_output_item (and whatever streaming chunk filter function is used
in the responses code) to validate the chunk is filtered when appropriate.

In `@tests/unit/utils/test_responses.py`:
- Around line 2840-2860: Add a new unit test that mirrors
test_no_tools_uses_prepare_tools but calls resolve_tool_choice with
tool_choice="none" (and tools=None) and asserts that the patched prepare_tools
(utils.responses.prepare_tools) is invoked with no_tools=True; locate the test
around test_no_tools_uses_prepare_tools and use the same
AsyncLlamaStackClientHolder/resolve_tool_choice setup and mock_prepare
AsyncMock, then assert mock_prepare.assert_called_once_with(no_tools=True) (or
equivalent) to ensure prepare_tools receives the no_tools flag when
resolve_tool_choice(tool_choice="none") is used.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d826784-c522-4d69-a050-222bc472dc2e

📥 Commits

Reviewing files that changed from the base of the PR and between 20809d0 and f02648d.

📒 Files selected for processing (2)

tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: server mode / ci
GitHub Check: E2E: library mode / ci

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

🔇 Additional comments (3)

tests/unit/app/endpoints/test_responses.py (1)

15-20: Import addition is clean and consistent.

The new _is_server_mcp_output_item import (Line 16) fits the existing absolute internal import pattern and keeps the test module scope focused.
tests/unit/utils/test_responses.py (2)
2624-2633: Add a positive mcp_approval_request server-side classification test.

Line 2626 only validates the unmatched-label/client-side path. A matching-label test is needed since mcp_approval_request is part of the server-event filtering contract (lines 2585-2596 test both paths for mcp_call, lines 2611-2622 for mcp_list_tools).
Proposed test addition
 class TestIsServerDeployedOutput:
@@
     def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None:
         """Test mcp_approval_request with unmatched label is client-side."""
@@
         assert is_server_deployed_output(item) is False
+
+    def test_mcp_approval_request_server_side_when_label_matches(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test mcp_approval_request with matching label is server-deployed."""
+        mock_config = mocker.Mock()
+        mock_server = mocker.Mock()
+        mock_server.name = "server-mcp"
+        mock_config.mcp_servers = [mock_server]
+        mocker.patch("utils.responses.configuration", mock_config)
+
+        item = mocker.Mock()
+        item.type = "mcp_approval_request"
+        item.server_label = "server-mcp"
+        assert is_server_deployed_output(item) is True
2769-2802: Header-merge tests should validate lowercase header keys.

Current assertions only use X-LCS-Merge-Server-Tools casing (lines 2797, 2835, 2889). Add a lowercase-key variant to prevent missing regressions when request headers are normalized before reaching resolve_tool_choice.
Proposed hardening for header-key coverage
 class TestResolveToolChoice:
@@
-    async def test_client_tools_with_merge_header(
+    `@pytest.mark.parametrize`(
+        "merge_header_key",
+        ["X-LCS-Merge-Server-Tools", "x-lcs-merge-server-tools"],
+    )
+    async def test_client_tools_with_merge_header(
         self, mocker: MockerFixture
+        , merge_header_key: str
     ) -> None:
@@
         tools, tool_choice, _ = await resolve_tool_choice(
             tools=[client_tool],
             tool_choice=None,
             token="tok",
-            request_headers={"X-LCS-Merge-Server-Tools": "true"},
+            request_headers={merge_header_key: "true"},
         )
Apply similar parametrization to test_merge_header_conflict_raises_409 and test_merge_header_no_server_tools_returns_client_only.

tests/unit/app/endpoints/test_responses.py

coderabbitai · 2026-03-30T21:53:23Z

tests/unit/utils/test_responses.py

+    async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
+        """Test that no client tools falls through to prepare_tools."""
+        mock_client = mocker.AsyncMock()
+        mock_holder = mocker.Mock()
+        mock_holder.get_client.return_value = mock_client
+        mocker.patch(
+            "utils.responses.AsyncLlamaStackClientHolder",
+            return_value=mock_holder,
+        )
+        server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])
+        mock_prepare = mocker.AsyncMock(return_value=[server_tool])
+        mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
+
+        tools, _, _ = await resolve_tool_choice(
+            tools=None,
+            tool_choice=None,
+            token="tok",
+        )
+        assert tools is not None
+        assert len(tools) == 1
+        mock_prepare.assert_called_once()


🛠️ Refactor suggestion | 🟠 Major

Add test coverage for tool_choice="none" scenario.

The test only validates tool_choice=None (line 2855). Per the PR objectives, there's a known issue where tool_choice="none" may not correctly disable tools when tools=None. Add a test asserting that prepare_tools is called with no_tools=True when tool_choice="none".

Proposed test addition

`@pytest.mark.asyncio` async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None: """Test that no client tools falls through to prepare_tools.""" @@ assert len(tools) == 1 mock_prepare.assert_called_once() + + `@pytest.mark.asyncio` + async def test_tool_choice_none_passes_no_tools_flag( + self, mocker: MockerFixture + ) -> None: + """Test that tool_choice='none' sets no_tools=True in prepare_tools.""" + mock_client = mocker.AsyncMock() + mock_holder = mocker.Mock() + mock_holder.get_client.return_value = mock_client + mocker.patch( + "utils.responses.AsyncLlamaStackClientHolder", + return_value=mock_holder, + ) + mock_prepare = mocker.AsyncMock(return_value=None) + mocker.patch("utils.responses.prepare_tools", new=mock_prepare) + + tools, _, _ = await resolve_tool_choice( + tools=None, + tool_choice="none", + token="tok", + ) + assert tools is None + mock_prepare.assert_called_once() + # Verify no_tools=True was passed + call_kwargs = mock_prepare.call_args.kwargs + assert call_kwargs.get("no_tools") is True

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:

"""Test that no client tools falls through to prepare_tools."""

mock_client = mocker.AsyncMock()

mock_holder = mocker.Mock()

mock_holder.get_client.return_value = mock_client

mocker.patch(

"utils.responses.AsyncLlamaStackClientHolder",

return_value=mock_holder,

)

server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])

mock_prepare = mocker.AsyncMock(return_value=[server_tool])

mocker.patch("utils.responses.prepare_tools", new=mock_prepare)

tools, _, _ = await resolve_tool_choice(

tools=None,

tool_choice=None,

token="tok",

)

assert tools is not None

assert len(tools) == 1

mock_prepare.assert_called_once()

async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:

"""Test that no client tools falls through to prepare_tools."""

mock_client = mocker.AsyncMock()

mock_holder = mocker.Mock()

mock_holder.get_client.return_value = mock_client

mocker.patch(

"utils.responses.AsyncLlamaStackClientHolder",

return_value=mock_holder,

)

server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])

mock_prepare = mocker.AsyncMock(return_value=[server_tool])

mocker.patch("utils.responses.prepare_tools", new=mock_prepare)

tools, _, _ = await resolve_tool_choice(

tools=None,

tool_choice=None,

token="tok",

)

assert tools is not None

assert len(tools) == 1

mock_prepare.assert_called_once()

`@pytest.mark.asyncio`

async def test_tool_choice_none_passes_no_tools_flag(

self, mocker: MockerFixture

) -> None:

"""Test that tool_choice='none' sets no_tools=True in prepare_tools."""

mock_client = mocker.AsyncMock()

mock_holder = mocker.Mock()

mock_holder.get_client.return_value = mock_client

mocker.patch(

"utils.responses.AsyncLlamaStackClientHolder",

return_value=mock_holder,

)

mock_prepare = mocker.AsyncMock(return_value=None)

mocker.patch("utils.responses.prepare_tools", new=mock_prepare)

tools, _, _ = await resolve_tool_choice(

tools=None,

tool_choice="none",

token="tok",

)

assert tools is None

mock_prepare.assert_called_once()

# Verify no_tools=True was passed

call_kwargs = mock_prepare.call_args.kwargs

assert call_kwargs.get("no_tools") is True

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_responses.py` around lines 2840 - 2860, Add a new unit test that mirrors test_no_tools_uses_prepare_tools but calls resolve_tool_choice with tool_choice="none" (and tools=None) and asserts that the patched prepare_tools (utils.responses.prepare_tools) is invoked with no_tools=True; locate the test around test_no_tools_uses_prepare_tools and use the same AsyncLlamaStackClientHolder/resolve_tool_choice setup and mock_prepare AsyncMock, then assert mock_prepare.assert_called_once_with(no_tools=True) (or equivalent) to ensure prepare_tools receives the no_tools flag when resolve_tool_choice(tool_choice="none") is used.

dprince · 2026-03-30T22:33:22Z

@tisnik thanks for the review. I added a commit to address the 'incomplete mcp filtering'. For the 'no_tools' comment I looked and think the no_tools variable is scoped to the entire function and is passed into both prepare_tools calls. let me know if I am missing something

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 556-576: The persisted turn still contains client-side tool items
because append_turn_items_to_conversation is called with
latest_response_object.output; before calling append_turn_items_to_conversation
(after building turn_summary and rag_chunks) filter
latest_response_object.output to only include items that pass
is_server_deployed_output() (same predicate used earlier), and pass that
filtered list to append_turn_items_to_conversation so client-only MCP/function
outputs are not stored; reference latest_response_object.output,
is_server_deployed_output(), and append_turn_items_to_conversation() when making
the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7a365396-e85c-42bc-8ec0-e4067d6ace16

📥 Commits

Reviewing files that changed from the base of the PR and between f02648d and 2d6fea6.

📒 Files selected for processing (2)

src/app/endpoints/responses.py
tests/unit/app/endpoints/test_responses.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: library mode / ci
GitHub Check: E2E: server mode / ci

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

src/app/endpoints/responses.py
tests/unit/app/endpoints/test_responses.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use FastAPI imports from the fastapi module: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack client imports
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntax str | int instead of Union[str, int] for type annotations
Use Optional[Type] for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for functions performing I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with @abstractmethod decorators for interface implementations
Use complete type annotations for all class attributes; avoid using Any type
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed

Files:

src/app/endpoints/responses.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/responses.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

tests/unit/app/endpoints/test_responses.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/app/endpoints/test_responses.py

🔇 Additional comments (2)

tests/unit/app/endpoints/test_responses.py (1)

1466-1498: Good regression coverage for approval-request substreams.

This closes the response.mcp_approval_request.* hole that previously let unknown MCP events leak to OpenAI-compatible clients.

src/app/endpoints/responses.py (1)

531-537: The review comment is incorrect. Both code paths in update_azure_token() properly update the AsyncLlamaStackClientHolder's internal client:

Library client mode (update_azure_token line 17): Calls reload_library_client(), which sets self._lsc = client (client.py:119)

Service client mode (update_azure_token line 33): Calls update_provider_data(), which does self._lsc = self._lsc.copy(set_default_headers=updated_headers) (client.py:54)

Since AsyncLlamaStackClientHolder is a Singleton, when _finalize_response() calls get_client() at line 572, it retrieves the same holder instance that was just updated by update_azure_token(). The client will have fresh credentials—there is no staleness issue.

coderabbitai · 2026-03-30T22:40:13Z

src/app/endpoints/responses.py

+        for item in latest_response_object.output:
+            if not is_server_deployed_output(item):
+                continue
+            tool_call, tool_result = build_tool_call_summary(item)
+            if tool_call:
+                turn_summary.tool_calls.append(tool_call)
+            if tool_result:
+                turn_summary.tool_results.append(tool_result)
+
+        tool_rag_chunks = parse_rag_chunks(
+            latest_response_object,
+            vector_store_ids,
+            configuration.rag_id_mapping,
+        )
+        turn_summary.rag_chunks = inline_rag_context.rag_chunks + tool_rag_chunks
+
+    client = AsyncLlamaStackClientHolder().get_client()
+    if api_params.store and api_params.previous_response_id and latest_response_object:
+        await append_turn_items_to_conversation(
+            client, api_params.conversation, user_input, latest_response_object.output
+        )


⚠️ Potential issue | 🟠 Major

Filter client tool items before persisting the streamed turn.

Line 557 already gates internal processing with is_server_deployed_output(), but append_turn_items_to_conversation() on Lines 573-576 still receives the full latest_response_object.output. src/utils/conversations.py:454-497 stores every llm_output item it gets, so client-local MCP/function items are still persisted on the streaming path.

Proposed fix

client = AsyncLlamaStackClientHolder().get_client() if api_params.store and api_params.previous_response_id and latest_response_object: + stored_output = [ + item + for item in latest_response_object.output + if is_server_deployed_output(item) + ] await append_turn_items_to_conversation( - client, api_params.conversation, user_input, latest_response_object.output + client, api_params.conversation, user_input, stored_output )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 556 - 576, The persisted turn still contains client-side tool items because append_turn_items_to_conversation is called with latest_response_object.output; before calling append_turn_items_to_conversation (after building turn_summary and rag_chunks) filter latest_response_object.output to only include items that pass is_server_deployed_output() (same predicate used earlier), and pass that filtered list to append_turn_items_to_conversation so client-only MCP/function outputs are not stored; reference latest_response_object.output, is_server_deployed_output(), and append_turn_items_to_conversation() when making the change.

Introduce X-LCS-Merge-Server-Tools header that allows client-provided tools to be merged with server-configured tools (RAG, MCP) instead of replacing them. Tool conflicts (duplicate MCP server_label or duplicate file_search) are rejected with a 409 error. Add is_server_deployed_output() to distinguish LCS-deployed tools from client tools so only server tools are included in turn summaries, metrics, and storage. Client tool output items are still returned in the response. Filter server-deployed MCP streaming events (mcp_call, mcp_list_tools, mcp_approval_request) from the SSE stream so OpenAI-compatible clients like Goose don't fail on unknown item types. Strip these items from the response.completed output array as well.

Document the Goose agent integration pattern where server-configured tools (RAG, MCP knowledge services) are merged with client-provided tools (oc/kubectl). Covers the X-LCS-Merge-Server-Tools header behavior, conflict detection, server-tool filtering in streamed responses, and an end-to-end conversation flow example.

Cover the new functions introduced in the server-tool merging feature: - is_server_deployed_output: classify output items as server vs client - _merge_tools: merge client and server tools with conflict detection - resolve_tool_choice: X-LCS-Merge-Server-Tools header integration - _is_server_mcp_output_item: filter serialized MCP items in streaming 29 new tests across utils/test_responses.py and endpoints/test_responses.py validating correct behavior for MCP label matching, 409 conflict rejection, file_search deduplication, and pass-through of non-MCP item types.

…lter_mcp_chunk The helper already handled mcp_approval_request items in the output_item.added and output_item.done branches, but the substream event prefix check only matched response.mcp_call.* and response.mcp_list_tools.*, allowing server-side approval request progress events to leak to clients that cannot handle them. Add the missing response.mcp_approval_request.* prefix check and add unit tests for _should_filter_mcp_chunk covering all three substream event types.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/utils/responses.py (1)
1202-1215: ⚠️ Potential issue | 🟠 Major

Filter file_search metadata with the same server-only predicate.

Lines 1202-1204 and Line 1215 still parse the full response.output; the new guard at Lines 1207-1208 only protects tool_calls and tool_results. A client-provided file_search will still populate TurnSummary.referenced_documents and TurnSummary.rag_chunks, so those client results continue to be stored.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 1202 - 1215, The parsing of referenced
documents and RAG chunks must use the same server-only filter as tool
call/result processing: before calling parse_referenced_documents(response,
vector_store_ids, rag_id_mapping) and parse_rag_chunks(response,
vector_store_ids, rag_id_mapping), filter response.output using
is_server_deployed_output to produce a server_outputs list (or add an
overload/parameter to those functions to accept filtered output); then pass that
filtered output into parse_referenced_documents and parse_rag_chunks so
client-provided file_search entries are excluded when populating
TurnSummary.referenced_documents and TurnSummary.rag_chunks.

♻️ Duplicate comments (4)

tests/unit/utils/test_responses.py (1)

2840-2860: 🛠️ Refactor suggestion | 🟠 Major

Add the tool_choice="none" regression case.

These tests only exercise tools=None with tool_choice=None. The regression called out in the PR discussion was tools=None plus tool_choice="none" not propagating no_tools=True into prepare_tools(), so this suite still will not catch that branch.

Proposed test addition

     `@pytest.mark.asyncio`
     async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
         """Test that no client tools falls through to prepare_tools."""
@@
         assert tools is not None
         assert len(tools) == 1
         mock_prepare.assert_called_once()
+
+    `@pytest.mark.asyncio`
+    async def test_tool_choice_none_passes_no_tools_flag(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test that tool_choice='none' forwards no_tools=True to prepare_tools."""
+        mock_client = mocker.AsyncMock()
+        mock_holder = mocker.Mock()
+        mock_holder.get_client.return_value = mock_client
+        mocker.patch(
+            "utils.responses.AsyncLlamaStackClientHolder",
+            return_value=mock_holder,
+        )
+        mock_prepare = mocker.AsyncMock(return_value=None)
+        mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
+
+        tools, _, _ = await resolve_tool_choice(
+            tools=None,
+            tool_choice="none",
+            token="tok",
+        )
+
+        assert tools is None
+        assert mock_prepare.call_args.kwargs["no_tools"] is True

As per coding guidelines, "Maintain test coverage of at least 60% for unit tests and 10% for integration tests".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_responses.py` around lines 2840 - 2860, Add a
regression test covering the case tools=None with tool_choice="none": copy or
extend the existing async test_no_tools_uses_prepare_tools and call
resolve_tool_choice with tools=None and tool_choice="none" (keeping the same
mocks for AsyncLlamaStackClientHolder and prepare_tools), then assert that
prepare_tools was awaited (mock_prepare.assert_called_once()) and that the
returned tools list matches the mocked server_tool; this ensures
resolve_tool_choice properly sets no_tools and passes it through into
prepare_tools.

docs/goose-example-tool-merging.md (1)

14-14: ⚠️ Potential issue | 🟡 Minor

Add language specifiers to both ASCII diagram fences.

Both fenced blocks are still bare triple backticks, so markdownlint will keep flagging MD040 on this page.

Proposed fix

-```
+```text
 OpenShift Cluster

-```
+```text
 Goose                     LCS /v1/responses        Llama Stack        MCP/RAG

Also applies to: 95-95

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/goose-example-tool-merging.md` at line 14, The two ASCII diagram fenced
code blocks use bare triple backticks and trigger markdownlint MD040; update
each opening fence (the lines with ```) to include a language specifier such as
"text" so they read ```text. Locate the two bare fences that wrap the diagrams
(the one before "OpenShift Cluster" and the one before "Goose                   
LCS /v1/responses        Llama Stack        MCP/RAG") and modify their opening
fences to ```text to silence MD040.

src/app/endpoints/responses.py (1)

556-576: ⚠️ Potential issue | 🟠 Major

Persist only server-deployed outputs on the streaming follow-up path.

Line 575 still passes latest_response_object.output wholesale into append_turn_items_to_conversation(). src/utils/conversations.py:454-494 persists every item it receives, so client-local function_call / MCP outputs are still written to the conversation store even though Lines 556-563 already skip them from the internal summary.

Proposed fix

     if api_params.store and api_params.previous_response_id and latest_response_object:
+        stored_output = [
+            item
+            for item in latest_response_object.output
+            if is_server_deployed_output(item)
+        ]
         await append_turn_items_to_conversation(
-            client, api_params.conversation, user_input, latest_response_object.output
+            client, api_params.conversation, user_input, stored_output
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 556 - 576, The call to
append_turn_items_to_conversation currently passes latest_response_object.output
directly, causing non-server outputs (e.g., function_call / MCP) to be
persisted; filter the outputs to only server-deployed items before calling
append_turn_items_to_conversation. Use the existing is_server_deployed_output
predicate (used earlier in the loop) to build a filtered list of outputs (e.g.,
persisted_outputs = [item for item in latest_response_object.output if
is_server_deployed_output(item)]) and pass that filtered list into
append_turn_items_to_conversation when invoking it via
AsyncLlamaStackClientHolder().get_client().

src/utils/responses.py (1)

1163-1171: ⚠️ Potential issue | 🟡 Minor

mcp_approval_response still bypasses the client/server split.

Lines 1163-1167 classify mcp_approval_request, but mcp_approval_response is not checked and falls through to the default True at Lines 1169-1171. A client-owned approval flow will therefore still create a server-side tool_result with no matching server tool_call. Track ownership for approval responses before the fallback, or explicitly exclude this type.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 1163 - 1171, The function that
determines server-side items currently checks item_type against ("mcp_call",
"mcp_list_tools", "mcp_approval_request") but omits "mcp_approval_response",
causing responses to fall through to the default True; update the conditional in
src/utils/responses.py (the branch using item_type, server_label,
configuration.mcp_servers, and output_item) to also handle
"mcp_approval_response" (or add an explicit branch for mcp_approval_response) so
the code checks server_label ownership for approval responses before the final
return True fallback.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/goose-example-tool-merging.md`:
- Around line 82-84: Update the docs to state that events matching tracked
output_index are dropped for approval requests as well: in the list replace or
extend bullet 2 to include response.mcp_approval_request.* alongside
response.mcp_call.* and response.mcp_list_tools.*, and ensure the surrounding
bullets referencing response.output_item.added and response.output_item.done
still describe tracking/clearing of the output_index for server_label-matched
MCP items.

In `@src/app/endpoints/responses.py`:
- Around line 465-528: The MCP filtering helpers currently only treat labeled
MCP items as server-deployed, but is_server_deployed_output() considers
unlabeled "mcp_call" as server-deployed; update _is_server_mcp_output_item and
_should_filter_mcp_chunk to mirror that logic by treating an MCP item with no
server_label (None/absent) as server-deployed as well as those whose
server_label is in configured_mcp_labels. Concretely, change the label checks in
_is_server_mcp_output_item, the response.output_item.added branch in
_should_filter_mcp_chunk (the item_type/server_label check and adding to
server_mcp_output_indices), and the response.output_item.done branch to also
accept server_label is None for "mcp_call" (and the other MCP types if intended)
so unlabeled server-deployed MCP streams are tracked and filtered.

In `@src/utils/responses.py`:
- Around line 1473-1489: The code passes an empty vector_store_ids (from
extract_vector_store_ids_from_tools) into prepare_tools which is treated as an
explicit override and suppresses server RAG; change the call so prepare_tools
only receives vector_store_ids when the list is non-empty (e.g., pass
vector_store_ids=vector_store_ids if vector_store_ids else None or omit the
arg), leaving other args (client, no_tools, token, mcp_headers, request_headers)
unchanged; this ensures prepare_tools can still add server file_search when the
client provided none while preserving
translate_tools_vector_store_ids/prepared_tools and the subsequent _merge_tools
behavior.

---

Outside diff comments:
In `@src/utils/responses.py`:
- Around line 1202-1215: The parsing of referenced documents and RAG chunks must
use the same server-only filter as tool call/result processing: before calling
parse_referenced_documents(response, vector_store_ids, rag_id_mapping) and
parse_rag_chunks(response, vector_store_ids, rag_id_mapping), filter
response.output using is_server_deployed_output to produce a server_outputs list
(or add an overload/parameter to those functions to accept filtered output);
then pass that filtered output into parse_referenced_documents and
parse_rag_chunks so client-provided file_search entries are excluded when
populating TurnSummary.referenced_documents and TurnSummary.rag_chunks.

---

Duplicate comments:
In `@docs/goose-example-tool-merging.md`:
- Line 14: The two ASCII diagram fenced code blocks use bare triple backticks
and trigger markdownlint MD040; update each opening fence (the lines with ```)
to include a language specifier such as "text" so they read ```text. Locate the
two bare fences that wrap the diagrams (the one before "OpenShift Cluster" and
the one before "Goose                     LCS /v1/responses        Llama Stack  
MCP/RAG") and modify their opening fences to ```text to silence MD040.

In `@src/app/endpoints/responses.py`:
- Around line 556-576: The call to append_turn_items_to_conversation currently
passes latest_response_object.output directly, causing non-server outputs (e.g.,
function_call / MCP) to be persisted; filter the outputs to only server-deployed
items before calling append_turn_items_to_conversation. Use the existing
is_server_deployed_output predicate (used earlier in the loop) to build a
filtered list of outputs (e.g., persisted_outputs = [item for item in
latest_response_object.output if is_server_deployed_output(item)]) and pass that
filtered list into append_turn_items_to_conversation when invoking it via
AsyncLlamaStackClientHolder().get_client().

In `@src/utils/responses.py`:
- Around line 1163-1171: The function that determines server-side items
currently checks item_type against ("mcp_call", "mcp_list_tools",
"mcp_approval_request") but omits "mcp_approval_response", causing responses to
fall through to the default True; update the conditional in
src/utils/responses.py (the branch using item_type, server_label,
configuration.mcp_servers, and output_item) to also handle
"mcp_approval_response" (or add an explicit branch for mcp_approval_response) so
the code checks server_label ownership for approval responses before the final
return True fallback.

In `@tests/unit/utils/test_responses.py`:
- Around line 2840-2860: Add a regression test covering the case tools=None with
tool_choice="none": copy or extend the existing async
test_no_tools_uses_prepare_tools and call resolve_tool_choice with tools=None
and tool_choice="none" (keeping the same mocks for AsyncLlamaStackClientHolder
and prepare_tools), then assert that prepare_tools was awaited
(mock_prepare.assert_called_once()) and that the returned tools list matches the
mocked server_tool; this ensures resolve_tool_choice properly sets no_tools and
passes it through into prepare_tools.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2c6dc5e3-b90c-4e1e-999d-b9d851150345

📥 Commits

Reviewing files that changed from the base of the PR and between 2d6fea6 and d0056d5.

📒 Files selected for processing (6)

docs/goose-example-tool-merging.md
src/app/endpoints/responses.py
src/utils/responses.py
tests/integration/endpoints/test_query_integration.py
tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: library mode / ci
GitHub Check: E2E: server mode / ci

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

tests/integration/endpoints/test_query_integration.py
tests/unit/app/endpoints/test_responses.py
src/app/endpoints/responses.py
src/utils/responses.py
tests/unit/utils/test_responses.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

tests/integration/endpoints/test_query_integration.py
tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use FastAPI imports from the fastapi module: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack client imports
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntax str | int instead of Union[str, int] for type annotations
Use Optional[Type] for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for functions performing I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with @abstractmethod decorators for interface implementations
Use complete type annotations for all class attributes; avoid using Any type
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed

Files:

src/app/endpoints/responses.py
src/utils/responses.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/responses.py

🧠 Learnings (2)

📚 Learning: 2026-02-25T07:46:39.608Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

src/utils/responses.py

📚 Learning: 2026-02-23T14:56:59.186Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

src/utils/responses.py

🪛 markdownlint-cli2 (0.22.0)

docs/goose-example-tool-merging.md

[warning] 14-14: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (2)

src/utils/responses.py (2)

1397-1424: Conflict handling here looks good.

Failing fast on duplicate MCP server_label and duplicate file_search avoids silent shadowing and gives callers a deterministic 409.

1496-1503: Good fix for tool_choice="none" without explicit tools.

Passing no_tools into the fallback prepare_tools() call preserves the expected "no tools" behavior when the request omits tools.

coderabbitai · 2026-03-31T11:15:49Z

docs/goose-example-tool-merging.md

+1. **`response.output_item.added`** -- If the item is an `mcp_call`/`mcp_list_tools`/`mcp_approval_request` with a `server_label` matching a configured MCP server, the event is dropped and the `output_index` is tracked.
+2. **`response.mcp_call.*` / `response.mcp_list_tools.*`** -- Events with a tracked `output_index` are dropped.
+3. **`response.output_item.done`** -- Matching items are dropped and the index tracking is cleared.


⚠️ Potential issue | 🟡 Minor

Document response.mcp_approval_request.* filtering as well.

Bullet 2 still lists only response.mcp_call.* and response.mcp_list_tools.*, but the implementation now drops response.mcp_approval_request.* events too. As written, this understates the SSE contract for Goose-style clients.

Proposed fix

-2. **`response.mcp_call.*` / `response.mcp_list_tools.*`** -- Events with a tracked `output_index` are dropped. +2. **`response.mcp_call.*` / `response.mcp_list_tools.*` / `response.mcp_approval_request.*`** -- Events with a tracked `output_index` are dropped.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

1. **`response.output_item.added`** -- If the item is an `mcp_call`/`mcp_list_tools`/`mcp_approval_request` with a `server_label` matching a configured MCP server, the event is dropped and the `output_index` is tracked.

2. **`response.mcp_call.*` / `response.mcp_list_tools.*`** -- Events with a tracked `output_index` are dropped.

3. **`response.output_item.done`** -- Matching items are dropped and the index tracking is cleared.

1. **`response.output_item.added`** -- If the item is an `mcp_call`/`mcp_list_tools`/`mcp_approval_request` with a `server_label` matching a configured MCP server, the event is dropped and the `output_index` is tracked.

2. **`response.mcp_call.*` / `response.mcp_list_tools.*` / `response.mcp_approval_request.*`** -- Events with a tracked `output_index` are dropped.

3. **`response.output_item.done`** -- Matching items are dropped and the index tracking is cleared.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/goose-example-tool-merging.md` around lines 82 - 84, Update the docs to state that events matching tracked output_index are dropped for approval requests as well: in the list replace or extend bullet 2 to include response.mcp_approval_request.* alongside response.mcp_call.* and response.mcp_list_tools.*, and ensure the surrounding bullets referencing response.output_item.added and response.output_item.done still describe tracking/clearing of the output_index for server_label-matched MCP items.

coderabbitai · 2026-03-31T11:15:49Z

src/app/endpoints/responses.py

+def _is_server_mcp_output_item(
+    item: dict[str, Any], configured_mcp_labels: set[str]
+) -> bool:
+    """Check if a serialized output item is a server-deployed MCP tool call.
+
+    Args:
+        item: A dict from the serialized response output array.
+        configured_mcp_labels: Set of server_label names configured in LCS.
+
+    Returns:
+        True if the item is an MCP call/list/approval from a server-deployed MCP server.
+    """
+    item_type = item.get("type")
+    if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
+        return item.get("server_label") in configured_mcp_labels
+    return False
+
+
+def _should_filter_mcp_chunk(
+    chunk: OpenAIResponseObjectStream,
+    event_type: Optional[str],
+    configured_mcp_labels: set[str],
+    server_mcp_output_indices: set[int],
+) -> bool:
+    """Check if a streaming chunk is a server-deployed MCP event that should be filtered.
+
+    Args:
+        chunk: The streaming chunk to check.
+        event_type: The event type of the chunk.
+        configured_mcp_labels: Set of server_label names configured in LCS.
+        server_mcp_output_indices: Tracked output indices of server-deployed MCP calls.
+
+    Returns:
+        True if the chunk should be filtered out from the client stream.
+    """
+    if event_type == "response.output_item.added":
+        item_added_chunk = cast(OutputItemAddedChunk, chunk)
+        item = item_added_chunk.item
+        item_type = getattr(item, "type", None)
+        if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
+            server_label = getattr(item, "server_label", None)
+            if server_label in configured_mcp_labels:
+                server_mcp_output_indices.add(item_added_chunk.output_index)
+                return True
+
+    if event_type and (
+        event_type.startswith("response.mcp_call.")
+        or event_type.startswith("response.mcp_list_tools.")
+        or event_type.startswith("response.mcp_approval_request.")
+    ):
+        output_index = getattr(chunk, "output_index", None)
+        if output_index in server_mcp_output_indices:
+            return True
+
+    if event_type == "response.output_item.done":
+        item_done_chunk = cast(OutputItemDoneChunk, chunk)
+        item = item_done_chunk.item
+        item_type = getattr(item, "type", None)
+        if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
+            if item_done_chunk.output_index in server_mcp_output_indices:
+                server_mcp_output_indices.discard(item_done_chunk.output_index)
+                return True
+
+    return False


⚠️ Potential issue | 🟠 Major

Align MCP stream filtering with is_server_deployed_output().

_finalize_response() on Line 557 uses is_server_deployed_output(), which already treats unlabeled mcp_call items as server-deployed. This helper and the response.output_item.added tracking branch only accept label-matched MCP items, so Line 644 leaves an unlabeled server mcp_call in response.completed and its substream events are never tracked or filtered.

Proposed fix

def _is_server_mcp_output_item( item: dict[str, Any], configured_mcp_labels: set[str] ) -> bool: """Check if a serialized output item is a server-deployed MCP tool call.""" item_type = item.get("type") + if item_type == "mcp_call" and item.get("server_label") is None: + return True if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"): return item.get("server_label") in configured_mcp_labels return False @@ if event_type == "response.output_item.added": item_added_chunk = cast(OutputItemAddedChunk, chunk) item = item_added_chunk.item item_type = getattr(item, "type", None) if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"): server_label = getattr(item, "server_label", None) + if item_type == "mcp_call" and server_label is None: + server_mcp_output_indices.add(item_added_chunk.output_index) + return True if server_label in configured_mcp_labels: server_mcp_output_indices.add(item_added_chunk.output_index) return True

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 465 - 528, The MCP filtering helpers currently only treat labeled MCP items as server-deployed, but is_server_deployed_output() considers unlabeled "mcp_call" as server-deployed; update _is_server_mcp_output_item and _should_filter_mcp_chunk to mirror that logic by treating an MCP item with no server_label (None/absent) as server-deployed as well as those whose server_label is in configured_mcp_labels. Concretely, change the label checks in _is_server_mcp_output_item, the response.output_item.added branch in _should_filter_mcp_chunk (the item_type/server_label check and adding to server_mcp_output_indices), and the response.output_item.done branch to also accept server_label is None for "mcp_call" (and the other MCP types if intended) so unlabeled server-deployed MCP streams are tracked and filtered.

coderabbitai · 2026-03-31T11:15:49Z

src/utils/responses.py

        vector_store_ids = extract_vector_store_ids_from_tools(tools)
        # Translate user-facing rag_ids to llama-stack vector_store_ids in each file_search tool
        byok_rags = configuration.configuration.byok_rag
        prepared_tools = translate_tools_vector_store_ids(tools, byok_rags)
+
+        # Optionally merge server-configured tools (RAG, MCP) with client tools
+        if merge_server_tools:
+            server_tools = await prepare_tools(
+                client=client,
+                vector_store_ids=vector_store_ids,
+                no_tools=no_tools,
+                token=token,
+                mcp_headers=mcp_headers,
+                request_headers=request_headers,
+            )
+            if server_tools:
+                prepared_tools = _merge_tools(prepared_tools, server_tools)


⚠️ Potential issue | 🟠 Major

Don't suppress server RAG when the client has no file_search.

At Line 1473, extract_vector_store_ids_from_tools() returns [] if the request only contains function/MCP tools. Passing that empty list into prepare_tools() at Line 1482 is treated as an explicit override, so no server file_search is prepared. In practice, merged requests never receive server-configured RAG.

🩹 Minimal fix

vector_store_ids = extract_vector_store_ids_from_tools(tools) + client_has_file_search = any(tool.type == "file_search" for tool in tools) # Translate user-facing rag_ids to llama-stack vector_store_ids in each file_search tool byok_rags = configuration.configuration.byok_rag prepared_tools = translate_tools_vector_store_ids(tools, byok_rags) # Optionally merge server-configured tools (RAG, MCP) with client tools if merge_server_tools: server_tools = await prepare_tools( client=client, - vector_store_ids=vector_store_ids, + vector_store_ids=vector_store_ids if client_has_file_search else None, no_tools=no_tools, token=token, mcp_headers=mcp_headers, request_headers=request_headers, )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/responses.py` around lines 1473 - 1489, The code passes an empty vector_store_ids (from extract_vector_store_ids_from_tools) into prepare_tools which is treated as an explicit override and suppresses server RAG; change the call so prepare_tools only receives vector_store_ids when the list is non-empty (e.g., pass vector_store_ids=vector_store_ids if vector_store_ids else None or omit the arg), leaving other args (client, no_tools, token, mcp_headers, request_headers) unchanged; this ensures prepare_tools can still add server file_search when the client provided none while preserving translate_tools_vector_store_ids/prepared_tools and the subsequent _merge_tools behavior.

asimurka · 2026-04-02T08:06:00Z

src/app/endpoints/responses.py

-        await append_turn_items_to_conversation(
-            client, api_params.conversation, user_input, latest_response_object.output
-        )
+    # Extract response metadata and persist conversation turn


Although this decomposition is valid, it is outside the scope of this PR. Please consider opening a separate PR for refactoring the handlers.

asimurka · 2026-04-02T08:21:52Z

src/utils/responses.py

+
+    for tool in client_tools:
+        if isinstance(tool, InputToolMCP) and tool.server_label in server_mcp_labels:
+            raise HTTPException(


Create a dedicated model in models/responses.py inheriting from AbstractErrorResponse rather than passing plain detail.

asimurka · 2026-04-02T08:22:09Z

src/utils/responses.py

+                ),
+            )
+        if isinstance(tool, InputToolFileSearch) and has_server_file_search:
+            raise HTTPException(


Here as well

asimurka · 2026-04-02T08:24:57Z

src/utils/responses.py

+    return list(client_tools) + list(server_tools)
+
+
 async def resolve_tool_choice(


Please rebase changes made to this function and adapt your changes to its new flow.

asimurka · 2026-04-02T08:27:01Z

src/utils/responses.py

        prepared_tools = translate_tools_vector_store_ids(tools, byok_rags)
+
+        # Optionally merge server-configured tools (RAG, MCP) with client tools
+        if merge_server_tools:


This seems incorrect to me. prepare_tools should be called every time tools are not explicitly specified by client. Making it conditional is not right.

dprince changed the title ~~/v1/responses support for MCP tool merging (goose agent support)~~ LCORE-1616 /v1/responses support for MCP tool merging (goose agent support) Mar 30, 2026

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

tisnik added the Review effort 3/5 label Mar 30, 2026

dprince force-pushed the merge_server_tools branch from 20809d0 to f02648d Compare March 30, 2026 21:48

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

dprince added 4 commits March 31, 2026 07:03

dprince force-pushed the merge_server_tools branch from 2d6fea6 to d0056d5 Compare March 31, 2026 11:04

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

asimurka requested changes Apr 2, 2026

View reviewed changes

		return list(client_tools) + list(server_tools)


		async def resolve_tool_choice(

Conversation

dprince commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

tisnik commented Mar 30, 2026

PR Reviewer Guide 🔍

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

dprince commented Mar 30, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

asimurka Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

dprince commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading