Skip to content

LCORE-1616 /v1/responses support for MCP tool merging (goose agent support)#1435

Open
dprince wants to merge 4 commits intolightspeed-core:mainfrom
dprince:merge_server_tools
Open

LCORE-1616 /v1/responses support for MCP tool merging (goose agent support)#1435
dprince wants to merge 4 commits intolightspeed-core:mainfrom
dprince:merge_server_tools

Conversation

@dprince
Copy link
Copy Markdown

@dprince dprince commented Mar 30, 2026

Description

Add server-tool merging and filtering for Responses API

Introduce X-LCS-Merge-Server-Tools header that allows client-provided
tools to be merged with server-configured tools (RAG, MCP) instead of
replacing them. Tool conflicts (duplicate MCP server_label or duplicate
file_search) are rejected with a 409 error.

Add is_server_deployed_output() to distinguish LCS-deployed tools from
client tools so only server tools are included in turn summaries,
metrics, and storage. Client tool output items are still returned in
the response.

Filter server-deployed MCP streaming events (mcp_call, mcp_list_tools,
mcp_approval_request) from the SSE stream so OpenAI-compatible clients
like Goose don't fail on unknown item types. Strip these items from the
response.completed output array as well.

Includes doc/diagram of goose (local agent) use case we are prototyping.

Type of change

  • Refactor
  • [x ] New feature
  • Bug fix
  • CVE fix
  • Optimization
  • [ x] Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by: Claude

Checklist before requesting a review

  • [x ] I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Added unit tests
  • Testing locally with working lightspeed-stack installation using /v1/responses API with Goose. Was able to verify working MCP tools at both the local agent (goose) level, and via the lightspeed-stack (server) level. If no extra header was configured in goose only the client tools were active (previous behavior)

Summary by CodeRabbit

  • New Features

    • Optional merging of server-configured and client-provided tools via request header; conflicts return HTTP 409.
    • Streaming responses filter and hide server-deployed tool outputs from client event streams; final stored turn summaries include only server-deployed tool outputs while client tool outputs are returned to callers.
  • Documentation

    • New guide describing the hybrid tool execution model, merge header, conflict semantics, and streaming/filtering behavior.
  • Tests

    • Expanded coverage for merging, conflict handling, filtering, and server/client output classification.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Walkthrough

Adds a header-driven hybrid tool merging model for /v1/responses, conflict detection returning HTTP 409, SSE filtering to hide server-deployed MCP events from clients, extraction of response finalization into a helper, and updates to tool resolution and turn-summary logic to distinguish server-deployed vs client-provided outputs.

Changes

Cohort / File(s) Summary
Documentation
docs/goose-example-tool-merging.md
New doc describing header-driven merging (X-LCS-Merge-Server-Tools: true), conflict rules (MCP server_label, file_search), SSE filtering of server-deployed MCP events, and example flows.
Tool resolution & summaries
src/utils/responses.py
Added is_server_deployed_output() and internal _merge_tools(); resolve_tool_choice() now detects merge header and can merge server tools or raise HTTP 409 on conflicts; build_turn_summary() skips client-provided outputs for stored summaries.
Response streaming & finalization
src/app/endpoints/responses.py
SSE serialization uses model_dump(..., by_alias=True); added server-MCP detection and chunk-filtering logic to suppress server-deployed MCP substream events and strip server MCP items from streamed response.output; extracted finalization into _finalize_response().
Integration tests
tests/integration/endpoints/test_query_integration.py
Updated mocked MCP server_label and assertions to reflect server-deployed tool outputs being treated as server-side (fewer tool_calls included in stored response).
Unit tests
tests/unit/app/endpoints/test_responses.py, tests/unit/utils/test_responses.py
Added unit tests for _is_server_mcp_output_item, _should_filter_mcp_chunk, is_server_deployed_output, _merge_tools (conflict 409 cases), and resolve_tool_choice merge behavior.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant Server as LCS Server
    participant ServerMCP as Server MCP
    participant ClientMCP as Client MCP

    Client->>Server: POST /v1/responses\ntools=[client_tools]\nX-LCS-Merge-Server-Tools: true
    Server->>ServerMCP: prepare_tools() (RAG + configured MCP)
    ServerMCP-->>Server: server_tools
    Note over Server: _merge_tools() checks conflicts:\n• MCP server_label collisions\n• file_search overlap
    alt Conflict Detected
        Server-->>Client: HTTP 409 Conflict
    else No Conflict
        Server->>Server: merged_tools = client_tools + server_tools
        Server-->>Client: Begin SSE streaming (merged tools available)
    end
Loading
sequenceDiagram
    participant Client
    participant Server as LCS Server
    participant MCP as Server-Deployed MCP
    participant Storage as Turn Summary & Storage

    Client->>Server: Request using merged tools
    Server->>MCP: Execute server-deployed tool calls
    MCP-->>Server: mcp_call / results
    Server->>Server: response_generator() streaming SSE
    Note over Server: Filter & hide server-deployed MCP item types\n(mcp_call, mcp_list_tools, mcp_approval_request)
    Server->>Client: SSE stream with server MCP events suppressed
    Server->>Server: _finalize_response()
    Note over Server: build_turn_summary() includes only server-deployed outputs
    Server->>Storage: Store summary/metrics with server tool outputs
    Server-->>Client: Complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'LCORE-1616 /v1/responses support for MCP tool merging (goose agent support)' directly and clearly describes the primary change: adding MCP tool merging support to the /v1/responses API endpoint with a focus on Goose agent compatibility.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dprince dprince changed the title /v1/responses support for MCP tool merging (goose agent support) LCORE-1616 /v1/responses support for MCP tool merging (goose agent support) Mar 30, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/goose-example-tool-merging.md`:
- Around line 95-142: The fenced code block containing the conversation flow
diagram (the block starting with "Goose                     LCS /v1/responses   
Llama Stack        MCP/RAG" and the opening triple backticks) is missing a
language specifier; update the opening fence from just ``` to ```text (or
```plaintext) so the diagram is treated as plain text in rendering and
syntax-highlighting. Locate the triple-backtick that begins the diagram and add
the specifier, leaving the rest of the block unchanged.
- Around line 14-50: The fenced ASCII diagram block in
docs/goose-example-tool-merging.md is missing a language specifier which linters
flag; update the opening fence for the diagram (the triple-backtick that
currently starts the block) to include a language token such as text or
plaintext (e.g., change ``` to ```text) so the ASCII diagram is annotated for
static analysis and rendering; ensure only the opening fence is changed and the
rest of the Goose Agent / OpenShift Cluster diagram content (the ASCII art)
remains untouched.

In `@src/utils/responses.py`:
- Around line 1126-1158: The is_server_deployed_output function currently treats
an MCPApprovalResponse (item_type "mcp_approval_response") as server-side
because it lacks a server_label; update is_server_deployed_output to explicitly
handle "mcp_approval_response" (e.g., return False to mark it as
client-provided) or implement lookup/propagation of server_label, and add a unit
test in TestIsServerDeployedOutput to cover the approval response case;
reference is_server_deployed_output and TestIsServerDeployedOutput when making
the change.

In `@tests/unit/utils/test_responses.py`:
- Around line 2773-2804: The test test_client_tools_with_merge_header only
asserts behavior when the header key uses uppercase "X-LCS-Merge-Server-Tools";
update the test (and the other affected tests around the same blocks) to also
call resolve_tool_choice with a lowercase header key "x-lcs-merge-server-tools"
and assert the same merged-tools behavior (same tools, tool_choice, etc.) so
header normalization won't cause regressions; locate the calls to
resolve_tool_choice and duplicate or parametrize them to include both header key
variants to validate lowercase handling.
- Around line 2626-2635: Add a complementary positive test that verifies
mcp_approval_request is classified server-side when the server label matches:
create a new test (e.g., test_mcp_approval_request_server_side) that mocks
configuration (mocker.patch("utils.responses.configuration")) with
mock_config.mcp_servers containing the target label (e.g., "client-mcp"), create
an item with item.type = "mcp_approval_request" and item.server_label =
"client-mcp", and assert is_server_deployed_output(item) is True; reference the
existing test_mcp_approval_request_client_side, the configuration mock, and the
is_server_deployed_output function to locate where to add this test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f5204ab3-7d8c-490d-8111-8a64f71f83a1

📥 Commits

Reviewing files that changed from the base of the PR and between 1b68d36 and 20809d0.

📒 Files selected for processing (6)
  • docs/goose-example-tool-merging.md
  • src/app/endpoints/responses.py
  • src/utils/responses.py
  • tests/integration/endpoints/test_query_integration.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E: server mode / ci
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

  • tests/integration/endpoints/test_query_integration.py
  • tests/unit/app/endpoints/test_responses.py
  • src/app/endpoints/responses.py
  • src/utils/responses.py
  • tests/unit/utils/test_responses.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

  • tests/integration/endpoints/test_query_integration.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use FastAPI imports from the fastapi module: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack client imports
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntax str | int instead of Union[str, int] for type annotations
Use Optional[Type] for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for functions performing I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with @abstractmethod decorators for interface implementations
Use complete type annotations for all class attributes; avoid using Any type
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed

Files:

  • src/app/endpoints/responses.py
  • src/utils/responses.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/responses.py
🧠 Learnings (2)
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
🪛 markdownlint-cli2 (0.22.0)
docs/goose-example-tool-merging.md

[warning] 14-14: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (13)
src/utils/responses.py (3)

1193-1195: LGTM! Server-deployed output filtering in turn summary.

The filter correctly excludes client-provided tool outputs from the turn summary, ensuring only server-deployed tool calls and results are stored in metrics and conversation history.


1363-1411: LGTM! Tool merge logic with conflict detection.

The implementation correctly:

  • Detects MCP conflicts by comparing server_label values
  • Detects file_search conflicts when both client and server provide one
  • Returns merged tools with client tools first
  • Uses appropriate HTTP 409 status code for conflicts

Minor observation: Line 1411 wraps inputs in list() unnecessarily since client_tools and server_tools are already typed as list[InputTool], but this has no functional impact.


1449-1476: LGTM! Header-driven merge behavior.

The merge logic correctly:

  • Checks for X-LCS-Merge-Server-Tools: true header (case-insensitive)
  • Only merges when the header is explicitly set
  • Preserves original behavior when header is absent
  • Reuses the same vector_store_ids resolved from client tools when preparing server tools
tests/integration/endpoints/test_query_integration.py (2)

731-736: LGTM! Test expectations updated for server-deployed tool filtering.

The assertions correctly reflect the new behavior where:

  • Only server-deployed tool calls are included in response.tool_calls
  • function_call items (client-provided) are excluded
  • The expected count is reduced from 2 to 1

The comment clearly explains the reasoning.


635-635: No issues found: server1 is a valid MCP server name configured in test fixtures.

Verification confirms the server_label change to "server1" aligns with the MCP server configuration in the test fixtures (tests/integration/test_configuration.py lines 85-93 define server1 as a valid MCP server). The test setup correctly pairs this label with the mcp_list_tools item, and the assertions properly validate server-deployed tool classification.

tests/unit/app/endpoints/test_responses.py (1)

1378-1427: LGTM! Comprehensive tests for _is_server_mcp_output_item.

The test class provides good coverage including:

  • MCP types with matching and non-matching server labels
  • All three MCP event types (mcp_call, mcp_list_tools, mcp_approval_request)
  • Missing server_label handling
  • Non-MCP types returning False
  • Empty configured labels edge case
src/app/endpoints/responses.py (5)

405-407: LGTM! Added by_alias=True for SSE serialization.

Using by_alias=True ensures field names are serialized according to their aliases (e.g., created_at vs createdAt), maintaining consistency with the OpenAI Responses specification for SSE events.

Also applies to: 423-425, 435-435, 451-453


465-480: LGTM! Helper for filtering serialized MCP output items.

The function correctly identifies server-deployed MCP items from the serialized response.output array by checking:

  1. Item type is one of the MCP types
  2. server_label matches a configured MCP server

This complements is_server_deployed_output from utils/responses.py which operates on typed objects rather than dicts.


483-527: LGTM! Streaming chunk filter with output index tracking.

The implementation correctly:

  • Tracks output_index values when server MCP items are added
  • Filters intermediate events (response.mcp_call.*, response.mcp_list_tools.*) by tracked indices
  • Cleans up tracking when output_item.done is received using discard() (safe for missing keys)
  • Mutates server_mcp_output_indices as a side effect, which is documented via the function's purpose

530-575: LGTM! Extracted finalization logic into helper.

The _finalize_response helper:

  • Consolidates post-stream metadata extraction
  • Uses is_server_deployed_output to filter tool summaries (consistent with non-streaming path)
  • Handles conversation persistence with append_turn_items_to_conversation

This improves readability of response_generator by separating streaming from finalization concerns.


636-644: LGTM! Filtering server MCP items from response output.

The inline list comprehension correctly removes server-deployed MCP items from the output array in the serialized response, ensuring clients only receive item types they understand (message, function_call, etc.).

docs/goose-example-tool-merging.md (1)

1-92: LGTM! Clear documentation for hybrid tool architecture.

The documentation provides excellent coverage of:

  • The separation of knowledge (server) vs action (client) tools
  • Header-driven merge behavior with clear behavior table
  • Conflict detection semantics
  • SSE streaming filter mechanics with step-by-step explanation
  • Turn summary behavior for server vs client outputs

The architecture and flow diagrams effectively illustrate the concepts.

tests/unit/utils/test_responses.py (1)

2576-2738: Good coverage on classification and conflict behavior.

The new TestIsServerDeployedOutput and TestMergeTools cases are well-targeted and align with the merge/conflict contract (including 409 behavior).

Comment on lines +14 to +50
```
OpenShift Cluster
┌────────────────────────────────────────────────────────────────────────────┐
│ │
│ namespace: foo-staging │
│ ┌─────────────────────────────────────┐ │
│ │ Goose Agent (Pod) │ │
│ │ │ │
│ │ ServiceAccount: goose-readonly │ │
│ │ (read-only, limited k8s/OCP access)│ │
│ │ │ HTTPS POST /v1/responses │
│ │ LLM reasoning loop │ │
│ │ ├─ LCS /v1/responses ──────────────┼─────────────────────┐ │
│ │ │ (knowledge queries) │ │ │
│ │ │ │ ▼ │
│ │ ├─ Local MCP tools │ namespace: lightspeed-stack │
│ │ │ (cluster read-only operations) │ ┌─────────────────────────┐ │
│ │ │ ├─ oc get pods │ │ Lightspeed Stack (LCS) │ │
│ │ │ ├─ oc logs │ │ │ │
│ │ │ └─ oc describe │ │ Llama Stack Engine │ │
│ │ └──────────────────────────────────┘ │ ├─ file_search (RAG) │ │
│ └──────────┬─────────────────────────── │ ├─ mcp: OKP/Solr │ │
│ │ ServiceAccount token │ ├─ mcp: errata │ │
│ │ (read-only RBAC) │ └─ mcp: bugzilla │ │
│ │ └───────────┬─────────────┘ │
│ │ ┌─────┴──────┐ │
│ │ │FAISS / Solr│ │
│ │ └────────────┘ │
│ │ │
│ │ namespace: foo-staging │
│ │ ┌───────────────────────────┐ │
│ └─▶│ pods, deployments, ... │ │
│ │ (read-only RBAC enforced)│ │
│ └───────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────────┘
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add language specifier to fenced code block.

Static analysis flagged this code block as missing a language specifier. For ASCII diagrams, use text or plaintext to satisfy linters and improve rendering hints.

-```
+```text
 OpenShift Cluster
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 14-14: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/goose-example-tool-merging.md` around lines 14 - 50, The fenced ASCII
diagram block in docs/goose-example-tool-merging.md is missing a language
specifier which linters flag; update the opening fence for the diagram (the
triple-backtick that currently starts the block) to include a language token
such as text or plaintext (e.g., change ``` to ```text) so the ASCII diagram is
annotated for static analysis and rendering; ensure only the opening fence is
changed and the rest of the Goose Agent / OpenShift Cluster diagram content (the
ASCII art) remains untouched.

Comment on lines +95 to +142
```
Goose LCS /v1/responses Llama Stack MCP/RAG
│ │ │ │
│ "pod api-7f8b9 is crash- │ │ │
│ looping in foo-staging, │ │ │
│ help me diagnose it" │ │ │
│ │ │ │
│ POST {input, tools, │ │ │
│ X-LCS-Merge-Server-Tools} │ │ │
│────────────────────────────▶│ │ │
│ │ merge client + │ │
│ │ server tools │ │
│ │ │ │
│ │ responses.create() │ │
│ │──────────────────────▶│ │
│ │ │ LLM: search RAG│
│ │ │───────────────▶│
│ │ │◀── RAG chunks: │
│ │ │ known OOM fix │
│ │ │ for api image │
│ │ │ │
│ │ │ LLM: query OKP │
│ │ │───────────────▶│
│ │ │◀── KB article: │
│ │ │ memory limits │
│ │ │ best practice │
│ │ │ │
│ │◀──────────────────────│ answer with │
│ │ │ RAG + KB ctx │
│ SSE: response.completed │ │ │
│ (server MCP items filtered)│ │ │
│◀────────────────────────────│ │ │
│ │ │ │
│ Goose combines LCS knowledge with live cluster data: │
│ │ │ │
│──▶ LOCAL MCP: oc get pods -n foo-staging │
│◀── pod/api-7f8b9 CrashLoopBackOff │
│──▶ LOCAL MCP: oc logs pod/api-7f8b9 │
│◀── OOMKilled │
│──▶ LOCAL MCP: oc describe pod/api-7f8b9 │
│◀── events, resource limits │
│ │
│ Goose correlates live cluster state with RAG/OKP │
│ knowledge to produce a diagnosis: │
│ "Pod is OOMKilled. RAG docs confirm a known fix │
│ for this image -- raise memory limit to 512Mi. │
│ See KB article for memory tuning best practices." │
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add language specifier to fenced code block.

Same as above, add text or plaintext language specifier for the conversation flow diagram.

-```
+```text
 Goose                     LCS /v1/responses        Llama Stack        MCP/RAG
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/goose-example-tool-merging.md` around lines 95 - 142, The fenced code
block containing the conversation flow diagram (the block starting with "Goose  
LCS /v1/responses        Llama Stack        MCP/RAG" and the opening triple
backticks) is missing a language specifier; update the opening fence from just
``` to ```text (or ```plaintext) so the diagram is treated as plain text in
rendering and syntax-highlighting. Locate the triple-backtick that begins the
diagram and add the specifier, leaving the rest of the block unchanged.

Comment on lines +1126 to +1158
def is_server_deployed_output(output_item: ResponseOutput) -> bool:
"""Check if a response output item belongs to a tool deployed by LCS.

In the hybrid architecture clients may provide their own tools (function
tools or MCP servers running locally) alongside server-configured tools.
This function identifies items that belong to LCS-deployed tools so that
only those are included in server-side processing (turn summary, metrics,
storage). Client tool output items are still returned in the response
to the caller but are not processed internally.

Args:
output_item: A ResponseOutput item from the response.

Returns:
True if the item should be processed by LCS, False for client tools.
"""
item_type = getattr(output_item, "type", None)

# function_call items are always from client-provided tools;
# LCS only configures file_search and mcp tools.
if item_type == "function_call":
return False

# MCP items: check server_label against configured servers
if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
server_label = getattr(output_item, "server_label", None)
if server_label is not None:
configured_labels = {s.name for s in configuration.mcp_servers}
return server_label in configured_labels

# file_search_call, web_search_call, message, and unknown types
# are treated as server-side.
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if mcp_approval_response has server_label attribute in the llama_stack_api types
rg -n "mcp_approval_response" --type=py -C5

Repository: lightspeed-core/lightspeed-stack

Length of output: 7880


🏁 Script executed:

#!/bin/bash
# Find MCPApprovalResponse definition and where approval responses are created
rg -n "class MCPApprovalResponse|MCPApprovalResponse.*=" --type=py -A3 | head -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

#!/bin/bash
# Search for where mcp_approval_response items originate/are constructed
rg -n "MCPApprovalResponse\(" --type=py -B2 -A2

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

#!/bin/bash
# Check if MCPApprovalResponse has server_label attribute
rg -n "approval_request_id|server_label" src/utils/responses.py | grep -A2 -B2 "820\|821\|822"

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

#!/bin/bash
# Find MCPApprovalResponse import and usage
rg -n "MCPApprovalResponse" --type=py | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 732


🏁 Script executed:

#!/bin/bash
# Look at the build_tool_call_summary function more closely to understand MCPApprovalResponse structure
sed -n '815,835p' src/utils/responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 774


🏁 Script executed:

#!/bin/bash
# Check all response output types handled in is_server_deployed_output context
rg -n "type.*=.*\"mcp" --type=py | head -30

Repository: lightspeed-core/lightspeed-stack

Length of output: 2390


🏁 Script executed:

#!/bin/bash
# Check imports at top of responses.py to understand MCPApprovalResponse
head -50 src/utils/responses.py | grep -A5 -B5 "MCPApprovalResponse"

Repository: lightspeed-core/lightspeed-stack

Length of output: 439


🏁 Script executed:

#!/bin/bash
# Search for any mentions of server_label in approval context
rg -n "server_label" src/utils/responses.py | grep -B3 -A3 "approval"

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

#!/bin/bash
# Check test for is_server_deployed_output to see if mcp_approval_response is tested
rg -n "is_server_deployed_output" --type=py -B2 -A10

Repository: lightspeed-core/lightspeed-stack

Length of output: 12041


🏁 Script executed:

#!/bin/bash
# Look for the full test class to see if there's a test for mcp_approval_response
sed -n '2576,2661p' tests/unit/utils/test_responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 3657


🏁 Script executed:

#!/bin/bash
# Check the MCPApprovalResponse type to see what attributes it has
rg -n "class.*MCPApprovalResponse|@dataclass.*approval" --type=py -A15 | head -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

#!/bin/bash
# Check if approval_request_id can be used to track origin of approval responses
rg -n "approval_request_id" src/utils/responses.py -B3 -A3

Repository: lightspeed-core/lightspeed-stack

Length of output: 410


🏁 Script executed:

#!/bin/bash
# Check if there's a way to track which approval_request originated from which server
rg -n "approval_request_id" src/utils/ -A2 -B2 | head -80

Repository: lightspeed-core/lightspeed-stack

Length of output: 856


🏁 Script executed:

#!/bin/bash
# Look for where mcp_approval_response items are created or generated to see if there's origin tracking
rg -n "mcp_approval_response" src/ --type=py | grep -v test | grep -v ".pyc"

Repository: lightspeed-core/lightspeed-stack

Length of output: 373


🏁 Script executed:

Aboriginal
# Check if there's any validation or origin tracking for approval requests/responses
rg -n "MCPApprovalRequest|MCPApprovalResponse" src/ --type=py -A5 | grep -E "server_label|origin|source" | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 124


Add explicit handling for mcp_approval_response type or document design assumption.

The function checks mcp_approval_request but not mcp_approval_response. Since MCPApprovalResponse (imported from llama_stack_api.openai_responses) lacks a server_label attribute and only carries approval_request_id, this type falls through to the default return True, classifying all approval responses as server-deployed regardless of origin.

If client-provided MCP servers can emit approval responses, they would be incorrectly included in server-side processing (turn summary, storage) due to the missing origin check. Either:

  • Add server_label tracking to approval responses so they can be classified by origin, or
  • Document that approval responses are always treated as server-side by design (with rationale), or
  • Add test coverage for this scenario to TestIsServerDeployedOutput
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 1126 - 1158, The
is_server_deployed_output function currently treats an MCPApprovalResponse
(item_type "mcp_approval_response") as server-side because it lacks a
server_label; update is_server_deployed_output to explicitly handle
"mcp_approval_response" (e.g., return False to mark it as client-provided) or
implement lookup/propagation of server_label, and add a unit test in
TestIsServerDeployedOutput to cover the approval response case; reference
is_server_deployed_output and TestIsServerDeployedOutput when making the change.

Comment on lines +2626 to +2635
def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None:
"""Test mcp_approval_request with unmatched label is client-side."""
mock_config = mocker.Mock()
mock_config.mcp_servers = []
mocker.patch("utils.responses.configuration", mock_config)

item = mocker.Mock()
item.type = "mcp_approval_request"
item.server_label = "client-mcp"
assert is_server_deployed_output(item) is False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add a positive mcp_approval_request server-side classification test.

Line 2626 only validates the unmatched-label/client-side path. Please add the matching-label path too, since this type is part of the server-event filtering contract.

Proposed test addition
 class TestIsServerDeployedOutput:
@@
     def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None:
         """Test mcp_approval_request with unmatched label is client-side."""
@@
         assert is_server_deployed_output(item) is False
+
+    def test_mcp_approval_request_server_side_when_label_matches(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test mcp_approval_request with matching label is server-deployed."""
+        mock_config = mocker.Mock()
+        mock_server = mocker.Mock()
+        mock_server.name = "server-mcp"
+        mock_config.mcp_servers = [mock_server]
+        mocker.patch("utils.responses.configuration", mock_config)
+
+        item = mocker.Mock()
+        item.type = "mcp_approval_request"
+        item.server_label = "server-mcp"
+        assert is_server_deployed_output(item) is True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_responses.py` around lines 2626 - 2635, Add a
complementary positive test that verifies mcp_approval_request is classified
server-side when the server label matches: create a new test (e.g.,
test_mcp_approval_request_server_side) that mocks configuration
(mocker.patch("utils.responses.configuration")) with mock_config.mcp_servers
containing the target label (e.g., "client-mcp"), create an item with item.type
= "mcp_approval_request" and item.server_label = "client-mcp", and assert
is_server_deployed_output(item) is True; reference the existing
test_mcp_approval_request_client_side, the configuration mock, and the
is_server_deployed_output function to locate where to add this test.

Comment on lines +2773 to +2804
async def test_client_tools_with_merge_header(
self, mocker: MockerFixture
) -> None:
"""Test client tools merged with server tools when header is set."""
mock_client = mocker.AsyncMock()
mock_holder = mocker.Mock()
mock_holder.get_client.return_value = mock_client
mocker.patch(
"utils.responses.AsyncLlamaStackClientHolder",
return_value=mock_holder,
)
mock_config = mocker.Mock()
mock_config.configuration.byok_rag = []
mocker.patch("utils.responses.configuration", mock_config)

server_mcp = InputToolMCP(
server_label="server-tool", server_url="http://server"
)
mocker.patch(
"utils.responses.prepare_tools",
new=mocker.AsyncMock(return_value=[server_mcp]),
)

client_tool = InputToolMCP(
server_label="client-tool", server_url="http://client"
)
tools, tool_choice, _ = await resolve_tool_choice(
tools=[client_tool],
tool_choice=None,
token="tok",
request_headers={"X-LCS-Merge-Server-Tools": "true"},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Header-merge tests should also validate lowercase header keys.

Current assertions only use X-LCS-Merge-Server-Tools casing (Line 2803/2841/2899). Add a lowercase-key variant to prevent missing regressions when request headers are normalized before reaching resolve_tool_choice.

Proposed hardening for header-key coverage
 class TestResolveToolChoice:
@@
-    async def test_client_tools_with_merge_header(
+    `@pytest.mark.parametrize`(
+        "merge_header_key",
+        ["X-LCS-Merge-Server-Tools", "x-lcs-merge-server-tools"],
+    )
+    async def test_client_tools_with_merge_header(
         self, mocker: MockerFixture
+        , merge_header_key: str
     ) -> None:
@@
         tools, tool_choice, _ = await resolve_tool_choice(
             tools=[client_tool],
             tool_choice=None,
             token="tok",
-            request_headers={"X-LCS-Merge-Server-Tools": "true"},
+            request_headers={merge_header_key: "true"},
         )

Also applies to: 2836-2842, 2895-2900

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_responses.py` around lines 2773 - 2804, The test
test_client_tools_with_merge_header only asserts behavior when the header key
uses uppercase "X-LCS-Merge-Server-Tools"; update the test (and the other
affected tests around the same blocks) to also call resolve_tool_choice with a
lowercase header key "x-lcs-merge-server-tools" and assert the same merged-tools
behavior (same tools, tool_choice, etc.) so header normalization won't cause
regressions; locate the calls to resolve_tool_choice and duplicate or
parametrize them to include both header key variants to validate lowercase
handling.

@tisnik
Copy link
Copy Markdown
Contributor

tisnik commented Mar 30, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

ToolChoice none ignored

The no_tools flag (when tool_choice is none) is computed only in the branch where client tools are provided, but not passed to prepare_tools in the else branch. As a result, requests with no tools and tool_choice: "none" will still load server tools instead of disabling them.

    mcp_headers: Optional[McpHeaders] = None,
    request_headers: Optional[Mapping[str, str]] = None,
) -> tuple[Optional[list[InputTool]], Optional[ToolChoice], Optional[list[str]]]:
    """Resolve tools and tool_choice for the Responses API.

    When the request includes tools and the ``X-LCS-Merge-Server-Tools: true``
    header is set, client-provided tools are merged with server-configured
    tools (RAG, MCP).  Conflicts (e.g. a client MCP tool with the same
    server_label as a server-configured one, or duplicate file_search tools)
    are rejected with a 409 error.
    Without the header, client-provided tools are used as-is (original
    behavior).  When no tools are provided in the request, all
    LCORE-configured tools are loaded via prepare_tools.  The tool_choice
    "none" flag disables all tools.

    Args:
        tools: Tools from the request, or None to use LCORE-configured tools.
        tool_choice: Requested tool choice (e.g. auto, required, none) or None.
        token: User token for MCP/auth.
        mcp_headers: Optional MCP headers to propagate.
        request_headers: Optional request headers for tool resolution.

    Returns:
        A tuple of (prepared_tools, prepared_tool_choice, vector_store_ids):
        prepared_tools is the list of tools to use, or None if none configured;
        prepared_tool_choice is the resolved tool choice, or None when there
        are no tools; vector_store_ids is extracted from tools (in user-facing format)
        when provided, otherwise None.
    """
    prepared_tools: Optional[list[InputTool]] = None
    client = AsyncLlamaStackClientHolder().get_client()
    no_tools = (
        isinstance(tool_choice, ToolChoiceMode) and tool_choice == ToolChoiceMode.none
    )

    merge_server_tools = (
        request_headers is not None
        and request_headers.get("X-LCS-Merge-Server-Tools", "").lower() == "true"
    )

    if tools:  # explicitly specified in request
        # Per-request override of vector stores (user-facing rag_ids)
        vector_store_ids = extract_vector_store_ids_from_tools(tools)
        # Translate user-facing rag_ids to llama-stack vector_store_ids in each file_search tool
        byok_rags = configuration.configuration.byok_rag
        prepared_tools = translate_tools_vector_store_ids(tools, byok_rags)

        # Optionally merge server-configured tools (RAG, MCP) with client tools
        if merge_server_tools:
            server_tools = await prepare_tools(
                client=client,
                vector_store_ids=vector_store_ids,
                no_tools=no_tools,
                token=token,
                mcp_headers=mcp_headers,
                request_headers=request_headers,
            )
            if server_tools:
                prepared_tools = _merge_tools(prepared_tools, server_tools)
Incomplete MCP filtering

The _should_filter_mcp_chunk helper drops events for mcp_call and mcp_list_tools substreams but does not filter response.mcp_approval_request.* events. This may leak server-side approval request chunks to clients that cannot handle them.

def _should_filter_mcp_chunk(
    chunk: OpenAIResponseObjectStream,
    event_type: Optional[str],
    configured_mcp_labels: set[str],
    server_mcp_output_indices: set[int],
) -> bool:
    """Check if a streaming chunk is a server-deployed MCP event that should be filtered.

    Args:
        chunk: The streaming chunk to check.
        event_type: The event type of the chunk.
        configured_mcp_labels: Set of server_label names configured in LCS.
        server_mcp_output_indices: Tracked output indices of server-deployed MCP calls.

    Returns:
        True if the chunk should be filtered out from the client stream.
    """
    if event_type == "response.output_item.added":
        item_added_chunk = cast(OutputItemAddedChunk, chunk)
        item = item_added_chunk.item
        item_type = getattr(item, "type", None)
        if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
            server_label = getattr(item, "server_label", None)
            if server_label in configured_mcp_labels:
                server_mcp_output_indices.add(item_added_chunk.output_index)
                return True

    if event_type and (
        event_type.startswith("response.mcp_call.")
        or event_type.startswith("response.mcp_list_tools.")
    ):
        output_index = getattr(chunk, "output_index", None)
        if output_index in server_mcp_output_indices:
            return True

    if event_type == "response.output_item.done":
        item_done_chunk = cast(OutputItemDoneChunk, chunk)
        item = item_done_chunk.item
        item_type = getattr(item, "type", None)
        if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
            if item_done_chunk.output_index in server_mcp_output_indices:
                server_mcp_output_indices.discard(item_done_chunk.output_index)
                return True

    return False

@dprince dprince force-pushed the merge_server_tools branch from 20809d0 to f02648d Compare March 30, 2026 21:48
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 1378-1427: Add a regression test ensuring approval-request
substream events are filtered from client streams: in TestIsServerMcpOutputItem
add a new test (e.g., test_mcp_approval_request_stream_filtering) that simulates
a streaming chunk key starting with "response.mcp_approval_request." and asserts
the streaming filter logic rejects/filters that chunk so approval-request events
cannot leak to clients; reference the mcp approval path by using the
"response.mcp_approval_request.*" key and the existing helper
_is_server_mcp_output_item (and whatever streaming chunk filter function is used
in the responses code) to validate the chunk is filtered when appropriate.

In `@tests/unit/utils/test_responses.py`:
- Around line 2840-2860: Add a new unit test that mirrors
test_no_tools_uses_prepare_tools but calls resolve_tool_choice with
tool_choice="none" (and tools=None) and asserts that the patched prepare_tools
(utils.responses.prepare_tools) is invoked with no_tools=True; locate the test
around test_no_tools_uses_prepare_tools and use the same
AsyncLlamaStackClientHolder/resolve_tool_choice setup and mock_prepare
AsyncMock, then assert mock_prepare.assert_called_once_with(no_tools=True) (or
equivalent) to ensure prepare_tools receives the no_tools flag when
resolve_tool_choice(tool_choice="none") is used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d826784-c522-4d69-a050-222bc472dc2e

📥 Commits

Reviewing files that changed from the base of the PR and between 20809d0 and f02648d.

📒 Files selected for processing (2)
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: library mode / ci
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
🔇 Additional comments (3)
tests/unit/app/endpoints/test_responses.py (1)

15-20: Import addition is clean and consistent.

The new _is_server_mcp_output_item import (Line 16) fits the existing absolute internal import pattern and keeps the test module scope focused.

tests/unit/utils/test_responses.py (2)

2624-2633: Add a positive mcp_approval_request server-side classification test.

Line 2626 only validates the unmatched-label/client-side path. A matching-label test is needed since mcp_approval_request is part of the server-event filtering contract (lines 2585-2596 test both paths for mcp_call, lines 2611-2622 for mcp_list_tools).

Proposed test addition
 class TestIsServerDeployedOutput:
@@
     def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None:
         """Test mcp_approval_request with unmatched label is client-side."""
@@
         assert is_server_deployed_output(item) is False
+
+    def test_mcp_approval_request_server_side_when_label_matches(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test mcp_approval_request with matching label is server-deployed."""
+        mock_config = mocker.Mock()
+        mock_server = mocker.Mock()
+        mock_server.name = "server-mcp"
+        mock_config.mcp_servers = [mock_server]
+        mocker.patch("utils.responses.configuration", mock_config)
+
+        item = mocker.Mock()
+        item.type = "mcp_approval_request"
+        item.server_label = "server-mcp"
+        assert is_server_deployed_output(item) is True

2769-2802: Header-merge tests should validate lowercase header keys.

Current assertions only use X-LCS-Merge-Server-Tools casing (lines 2797, 2835, 2889). Add a lowercase-key variant to prevent missing regressions when request headers are normalized before reaching resolve_tool_choice.

Proposed hardening for header-key coverage
 class TestResolveToolChoice:
@@
-    async def test_client_tools_with_merge_header(
+    `@pytest.mark.parametrize`(
+        "merge_header_key",
+        ["X-LCS-Merge-Server-Tools", "x-lcs-merge-server-tools"],
+    )
+    async def test_client_tools_with_merge_header(
         self, mocker: MockerFixture
+        , merge_header_key: str
     ) -> None:
@@
         tools, tool_choice, _ = await resolve_tool_choice(
             tools=[client_tool],
             tool_choice=None,
             token="tok",
-            request_headers={"X-LCS-Merge-Server-Tools": "true"},
+            request_headers={merge_header_key: "true"},
         )

Apply similar parametrization to test_merge_header_conflict_raises_409 and test_merge_header_no_server_tools_returns_client_only.

Comment on lines +2840 to +2860
async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
"""Test that no client tools falls through to prepare_tools."""
mock_client = mocker.AsyncMock()
mock_holder = mocker.Mock()
mock_holder.get_client.return_value = mock_client
mocker.patch(
"utils.responses.AsyncLlamaStackClientHolder",
return_value=mock_holder,
)
server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])
mock_prepare = mocker.AsyncMock(return_value=[server_tool])
mocker.patch("utils.responses.prepare_tools", new=mock_prepare)

tools, _, _ = await resolve_tool_choice(
tools=None,
tool_choice=None,
token="tok",
)
assert tools is not None
assert len(tools) == 1
mock_prepare.assert_called_once()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add test coverage for tool_choice="none" scenario.

The test only validates tool_choice=None (line 2855). Per the PR objectives, there's a known issue where tool_choice="none" may not correctly disable tools when tools=None. Add a test asserting that prepare_tools is called with no_tools=True when tool_choice="none".

Proposed test addition
     `@pytest.mark.asyncio`
     async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
         """Test that no client tools falls through to prepare_tools."""
@@
         assert len(tools) == 1
         mock_prepare.assert_called_once()
+
+    `@pytest.mark.asyncio`
+    async def test_tool_choice_none_passes_no_tools_flag(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test that tool_choice='none' sets no_tools=True in prepare_tools."""
+        mock_client = mocker.AsyncMock()
+        mock_holder = mocker.Mock()
+        mock_holder.get_client.return_value = mock_client
+        mocker.patch(
+            "utils.responses.AsyncLlamaStackClientHolder",
+            return_value=mock_holder,
+        )
+        mock_prepare = mocker.AsyncMock(return_value=None)
+        mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
+
+        tools, _, _ = await resolve_tool_choice(
+            tools=None,
+            tool_choice="none",
+            token="tok",
+        )
+        assert tools is None
+        mock_prepare.assert_called_once()
+        # Verify no_tools=True was passed
+        call_kwargs = mock_prepare.call_args.kwargs
+        assert call_kwargs.get("no_tools") is True
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
"""Test that no client tools falls through to prepare_tools."""
mock_client = mocker.AsyncMock()
mock_holder = mocker.Mock()
mock_holder.get_client.return_value = mock_client
mocker.patch(
"utils.responses.AsyncLlamaStackClientHolder",
return_value=mock_holder,
)
server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])
mock_prepare = mocker.AsyncMock(return_value=[server_tool])
mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
tools, _, _ = await resolve_tool_choice(
tools=None,
tool_choice=None,
token="tok",
)
assert tools is not None
assert len(tools) == 1
mock_prepare.assert_called_once()
async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
"""Test that no client tools falls through to prepare_tools."""
mock_client = mocker.AsyncMock()
mock_holder = mocker.Mock()
mock_holder.get_client.return_value = mock_client
mocker.patch(
"utils.responses.AsyncLlamaStackClientHolder",
return_value=mock_holder,
)
server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])
mock_prepare = mocker.AsyncMock(return_value=[server_tool])
mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
tools, _, _ = await resolve_tool_choice(
tools=None,
tool_choice=None,
token="tok",
)
assert tools is not None
assert len(tools) == 1
mock_prepare.assert_called_once()
`@pytest.mark.asyncio`
async def test_tool_choice_none_passes_no_tools_flag(
self, mocker: MockerFixture
) -> None:
"""Test that tool_choice='none' sets no_tools=True in prepare_tools."""
mock_client = mocker.AsyncMock()
mock_holder = mocker.Mock()
mock_holder.get_client.return_value = mock_client
mocker.patch(
"utils.responses.AsyncLlamaStackClientHolder",
return_value=mock_holder,
)
mock_prepare = mocker.AsyncMock(return_value=None)
mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
tools, _, _ = await resolve_tool_choice(
tools=None,
tool_choice="none",
token="tok",
)
assert tools is None
mock_prepare.assert_called_once()
# Verify no_tools=True was passed
call_kwargs = mock_prepare.call_args.kwargs
assert call_kwargs.get("no_tools") is True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_responses.py` around lines 2840 - 2860, Add a new unit
test that mirrors test_no_tools_uses_prepare_tools but calls resolve_tool_choice
with tool_choice="none" (and tools=None) and asserts that the patched
prepare_tools (utils.responses.prepare_tools) is invoked with no_tools=True;
locate the test around test_no_tools_uses_prepare_tools and use the same
AsyncLlamaStackClientHolder/resolve_tool_choice setup and mock_prepare
AsyncMock, then assert mock_prepare.assert_called_once_with(no_tools=True) (or
equivalent) to ensure prepare_tools receives the no_tools flag when
resolve_tool_choice(tool_choice="none") is used.

@dprince
Copy link
Copy Markdown
Author

dprince commented Mar 30, 2026

@tisnik thanks for the review. I added a commit to address the 'incomplete mcp filtering'. For the 'no_tools' comment I looked and think the no_tools variable is scoped to the entire function and is passed into both prepare_tools calls. let me know if I am missing something

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 556-576: The persisted turn still contains client-side tool items
because append_turn_items_to_conversation is called with
latest_response_object.output; before calling append_turn_items_to_conversation
(after building turn_summary and rag_chunks) filter
latest_response_object.output to only include items that pass
is_server_deployed_output() (same predicate used earlier), and pass that
filtered list to append_turn_items_to_conversation so client-only MCP/function
outputs are not stored; reference latest_response_object.output,
is_server_deployed_output(), and append_turn_items_to_conversation() when making
the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7a365396-e85c-42bc-8ec0-e4067d6ace16

📥 Commits

Reviewing files that changed from the base of the PR and between f02648d and 2d6fea6.

📒 Files selected for processing (2)
  • src/app/endpoints/responses.py
  • tests/unit/app/endpoints/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E: server mode / ci
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

  • src/app/endpoints/responses.py
  • tests/unit/app/endpoints/test_responses.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use FastAPI imports from the fastapi module: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack client imports
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntax str | int instead of Union[str, int] for type annotations
Use Optional[Type] for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for functions performing I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with @abstractmethod decorators for interface implementations
Use complete type annotations for all class attributes; avoid using Any type
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed

Files:

  • src/app/endpoints/responses.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/responses.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

  • tests/unit/app/endpoints/test_responses.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/endpoints/test_responses.py
🔇 Additional comments (2)
tests/unit/app/endpoints/test_responses.py (1)

1466-1498: Good regression coverage for approval-request substreams.

This closes the response.mcp_approval_request.* hole that previously let unknown MCP events leak to OpenAI-compatible clients.

src/app/endpoints/responses.py (1)

531-537: The review comment is incorrect. Both code paths in update_azure_token() properly update the AsyncLlamaStackClientHolder's internal client:

  1. Library client mode (update_azure_token line 17): Calls reload_library_client(), which sets self._lsc = client (client.py:119)
  2. Service client mode (update_azure_token line 33): Calls update_provider_data(), which does self._lsc = self._lsc.copy(set_default_headers=updated_headers) (client.py:54)

Since AsyncLlamaStackClientHolder is a Singleton, when _finalize_response() calls get_client() at line 572, it retrieves the same holder instance that was just updated by update_azure_token(). The client will have fresh credentials—there is no staleness issue.

Comment on lines +556 to +576
for item in latest_response_object.output:
if not is_server_deployed_output(item):
continue
tool_call, tool_result = build_tool_call_summary(item)
if tool_call:
turn_summary.tool_calls.append(tool_call)
if tool_result:
turn_summary.tool_results.append(tool_result)

tool_rag_chunks = parse_rag_chunks(
latest_response_object,
vector_store_ids,
configuration.rag_id_mapping,
)
turn_summary.rag_chunks = inline_rag_context.rag_chunks + tool_rag_chunks

client = AsyncLlamaStackClientHolder().get_client()
if api_params.store and api_params.previous_response_id and latest_response_object:
await append_turn_items_to_conversation(
client, api_params.conversation, user_input, latest_response_object.output
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Filter client tool items before persisting the streamed turn.

Line 557 already gates internal processing with is_server_deployed_output(), but append_turn_items_to_conversation() on Lines 573-576 still receives the full latest_response_object.output. src/utils/conversations.py:454-497 stores every llm_output item it gets, so client-local MCP/function items are still persisted on the streaming path.

Proposed fix
     client = AsyncLlamaStackClientHolder().get_client()
     if api_params.store and api_params.previous_response_id and latest_response_object:
+        stored_output = [
+            item
+            for item in latest_response_object.output
+            if is_server_deployed_output(item)
+        ]
         await append_turn_items_to_conversation(
-            client, api_params.conversation, user_input, latest_response_object.output
+            client, api_params.conversation, user_input, stored_output
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 556 - 576, The persisted turn
still contains client-side tool items because append_turn_items_to_conversation
is called with latest_response_object.output; before calling
append_turn_items_to_conversation (after building turn_summary and rag_chunks)
filter latest_response_object.output to only include items that pass
is_server_deployed_output() (same predicate used earlier), and pass that
filtered list to append_turn_items_to_conversation so client-only MCP/function
outputs are not stored; reference latest_response_object.output,
is_server_deployed_output(), and append_turn_items_to_conversation() when making
the change.

dprince added 4 commits March 31, 2026 07:03
Introduce X-LCS-Merge-Server-Tools header that allows client-provided
tools to be merged with server-configured tools (RAG, MCP) instead of
replacing them. Tool conflicts (duplicate MCP server_label or duplicate
file_search) are rejected with a 409 error.

Add is_server_deployed_output() to distinguish LCS-deployed tools from
client tools so only server tools are included in turn summaries,
metrics, and storage. Client tool output items are still returned in
the response.

Filter server-deployed MCP streaming events (mcp_call, mcp_list_tools,
mcp_approval_request) from the SSE stream so OpenAI-compatible clients
like Goose don't fail on unknown item types. Strip these items from the
response.completed output array as well.
Document the Goose agent integration pattern where server-configured tools
(RAG, MCP knowledge services) are merged with client-provided tools
(oc/kubectl). Covers the X-LCS-Merge-Server-Tools header behavior,
conflict detection, server-tool filtering in streamed responses, and
an end-to-end conversation flow example.
Cover the new functions introduced in the server-tool merging feature:
- is_server_deployed_output: classify output items as server vs client
- _merge_tools: merge client and server tools with conflict detection
- resolve_tool_choice: X-LCS-Merge-Server-Tools header integration
- _is_server_mcp_output_item: filter serialized MCP items in streaming

29 new tests across utils/test_responses.py and endpoints/test_responses.py
validating correct behavior for MCP label matching, 409 conflict rejection,
file_search deduplication, and pass-through of non-MCP item types.
…lter_mcp_chunk

The helper already handled mcp_approval_request items in the output_item.added
and output_item.done branches, but the substream event prefix check only matched
response.mcp_call.* and response.mcp_list_tools.*, allowing server-side approval
request progress events to leak to clients that cannot handle them.

Add the missing response.mcp_approval_request.* prefix check and add unit tests
for _should_filter_mcp_chunk covering all three substream event types.
@dprince dprince force-pushed the merge_server_tools branch from 2d6fea6 to d0056d5 Compare March 31, 2026 11:04
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utils/responses.py (1)

1202-1215: ⚠️ Potential issue | 🟠 Major

Filter file_search metadata with the same server-only predicate.

Lines 1202-1204 and Line 1215 still parse the full response.output; the new guard at Lines 1207-1208 only protects tool_calls and tool_results. A client-provided file_search will still populate TurnSummary.referenced_documents and TurnSummary.rag_chunks, so those client results continue to be stored.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 1202 - 1215, The parsing of referenced
documents and RAG chunks must use the same server-only filter as tool
call/result processing: before calling parse_referenced_documents(response,
vector_store_ids, rag_id_mapping) and parse_rag_chunks(response,
vector_store_ids, rag_id_mapping), filter response.output using
is_server_deployed_output to produce a server_outputs list (or add an
overload/parameter to those functions to accept filtered output); then pass that
filtered output into parse_referenced_documents and parse_rag_chunks so
client-provided file_search entries are excluded when populating
TurnSummary.referenced_documents and TurnSummary.rag_chunks.
♻️ Duplicate comments (4)
tests/unit/utils/test_responses.py (1)

2840-2860: 🛠️ Refactor suggestion | 🟠 Major

Add the tool_choice="none" regression case.

These tests only exercise tools=None with tool_choice=None. The regression called out in the PR discussion was tools=None plus tool_choice="none" not propagating no_tools=True into prepare_tools(), so this suite still will not catch that branch.

Proposed test addition
     `@pytest.mark.asyncio`
     async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
         """Test that no client tools falls through to prepare_tools."""
@@
         assert tools is not None
         assert len(tools) == 1
         mock_prepare.assert_called_once()
+
+    `@pytest.mark.asyncio`
+    async def test_tool_choice_none_passes_no_tools_flag(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test that tool_choice='none' forwards no_tools=True to prepare_tools."""
+        mock_client = mocker.AsyncMock()
+        mock_holder = mocker.Mock()
+        mock_holder.get_client.return_value = mock_client
+        mocker.patch(
+            "utils.responses.AsyncLlamaStackClientHolder",
+            return_value=mock_holder,
+        )
+        mock_prepare = mocker.AsyncMock(return_value=None)
+        mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
+
+        tools, _, _ = await resolve_tool_choice(
+            tools=None,
+            tool_choice="none",
+            token="tok",
+        )
+
+        assert tools is None
+        assert mock_prepare.call_args.kwargs["no_tools"] is True

As per coding guidelines, "Maintain test coverage of at least 60% for unit tests and 10% for integration tests".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_responses.py` around lines 2840 - 2860, Add a
regression test covering the case tools=None with tool_choice="none": copy or
extend the existing async test_no_tools_uses_prepare_tools and call
resolve_tool_choice with tools=None and tool_choice="none" (keeping the same
mocks for AsyncLlamaStackClientHolder and prepare_tools), then assert that
prepare_tools was awaited (mock_prepare.assert_called_once()) and that the
returned tools list matches the mocked server_tool; this ensures
resolve_tool_choice properly sets no_tools and passes it through into
prepare_tools.
docs/goose-example-tool-merging.md (1)

14-14: ⚠️ Potential issue | 🟡 Minor

Add language specifiers to both ASCII diagram fences.

Both fenced blocks are still bare triple backticks, so markdownlint will keep flagging MD040 on this page.

Proposed fix
-```
+```text
 OpenShift Cluster
-```
+```text
 Goose                     LCS /v1/responses        Llama Stack        MCP/RAG

Also applies to: 95-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/goose-example-tool-merging.md` at line 14, The two ASCII diagram fenced
code blocks use bare triple backticks and trigger markdownlint MD040; update
each opening fence (the lines with ```) to include a language specifier such as
"text" so they read ```text. Locate the two bare fences that wrap the diagrams
(the one before "OpenShift Cluster" and the one before "Goose                   
LCS /v1/responses        Llama Stack        MCP/RAG") and modify their opening
fences to ```text to silence MD040.
src/app/endpoints/responses.py (1)

556-576: ⚠️ Potential issue | 🟠 Major

Persist only server-deployed outputs on the streaming follow-up path.

Line 575 still passes latest_response_object.output wholesale into append_turn_items_to_conversation(). src/utils/conversations.py:454-494 persists every item it receives, so client-local function_call / MCP outputs are still written to the conversation store even though Lines 556-563 already skip them from the internal summary.

Proposed fix
     if api_params.store and api_params.previous_response_id and latest_response_object:
+        stored_output = [
+            item
+            for item in latest_response_object.output
+            if is_server_deployed_output(item)
+        ]
         await append_turn_items_to_conversation(
-            client, api_params.conversation, user_input, latest_response_object.output
+            client, api_params.conversation, user_input, stored_output
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 556 - 576, The call to
append_turn_items_to_conversation currently passes latest_response_object.output
directly, causing non-server outputs (e.g., function_call / MCP) to be
persisted; filter the outputs to only server-deployed items before calling
append_turn_items_to_conversation. Use the existing is_server_deployed_output
predicate (used earlier in the loop) to build a filtered list of outputs (e.g.,
persisted_outputs = [item for item in latest_response_object.output if
is_server_deployed_output(item)]) and pass that filtered list into
append_turn_items_to_conversation when invoking it via
AsyncLlamaStackClientHolder().get_client().
src/utils/responses.py (1)

1163-1171: ⚠️ Potential issue | 🟡 Minor

mcp_approval_response still bypasses the client/server split.

Lines 1163-1167 classify mcp_approval_request, but mcp_approval_response is not checked and falls through to the default True at Lines 1169-1171. A client-owned approval flow will therefore still create a server-side tool_result with no matching server tool_call. Track ownership for approval responses before the fallback, or explicitly exclude this type.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 1163 - 1171, The function that
determines server-side items currently checks item_type against ("mcp_call",
"mcp_list_tools", "mcp_approval_request") but omits "mcp_approval_response",
causing responses to fall through to the default True; update the conditional in
src/utils/responses.py (the branch using item_type, server_label,
configuration.mcp_servers, and output_item) to also handle
"mcp_approval_response" (or add an explicit branch for mcp_approval_response) so
the code checks server_label ownership for approval responses before the final
return True fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/goose-example-tool-merging.md`:
- Around line 82-84: Update the docs to state that events matching tracked
output_index are dropped for approval requests as well: in the list replace or
extend bullet 2 to include response.mcp_approval_request.* alongside
response.mcp_call.* and response.mcp_list_tools.*, and ensure the surrounding
bullets referencing response.output_item.added and response.output_item.done
still describe tracking/clearing of the output_index for server_label-matched
MCP items.

In `@src/app/endpoints/responses.py`:
- Around line 465-528: The MCP filtering helpers currently only treat labeled
MCP items as server-deployed, but is_server_deployed_output() considers
unlabeled "mcp_call" as server-deployed; update _is_server_mcp_output_item and
_should_filter_mcp_chunk to mirror that logic by treating an MCP item with no
server_label (None/absent) as server-deployed as well as those whose
server_label is in configured_mcp_labels. Concretely, change the label checks in
_is_server_mcp_output_item, the response.output_item.added branch in
_should_filter_mcp_chunk (the item_type/server_label check and adding to
server_mcp_output_indices), and the response.output_item.done branch to also
accept server_label is None for "mcp_call" (and the other MCP types if intended)
so unlabeled server-deployed MCP streams are tracked and filtered.

In `@src/utils/responses.py`:
- Around line 1473-1489: The code passes an empty vector_store_ids (from
extract_vector_store_ids_from_tools) into prepare_tools which is treated as an
explicit override and suppresses server RAG; change the call so prepare_tools
only receives vector_store_ids when the list is non-empty (e.g., pass
vector_store_ids=vector_store_ids if vector_store_ids else None or omit the
arg), leaving other args (client, no_tools, token, mcp_headers, request_headers)
unchanged; this ensures prepare_tools can still add server file_search when the
client provided none while preserving
translate_tools_vector_store_ids/prepared_tools and the subsequent _merge_tools
behavior.

---

Outside diff comments:
In `@src/utils/responses.py`:
- Around line 1202-1215: The parsing of referenced documents and RAG chunks must
use the same server-only filter as tool call/result processing: before calling
parse_referenced_documents(response, vector_store_ids, rag_id_mapping) and
parse_rag_chunks(response, vector_store_ids, rag_id_mapping), filter
response.output using is_server_deployed_output to produce a server_outputs list
(or add an overload/parameter to those functions to accept filtered output);
then pass that filtered output into parse_referenced_documents and
parse_rag_chunks so client-provided file_search entries are excluded when
populating TurnSummary.referenced_documents and TurnSummary.rag_chunks.

---

Duplicate comments:
In `@docs/goose-example-tool-merging.md`:
- Line 14: The two ASCII diagram fenced code blocks use bare triple backticks
and trigger markdownlint MD040; update each opening fence (the lines with ```)
to include a language specifier such as "text" so they read ```text. Locate the
two bare fences that wrap the diagrams (the one before "OpenShift Cluster" and
the one before "Goose                     LCS /v1/responses        Llama Stack  
MCP/RAG") and modify their opening fences to ```text to silence MD040.

In `@src/app/endpoints/responses.py`:
- Around line 556-576: The call to append_turn_items_to_conversation currently
passes latest_response_object.output directly, causing non-server outputs (e.g.,
function_call / MCP) to be persisted; filter the outputs to only server-deployed
items before calling append_turn_items_to_conversation. Use the existing
is_server_deployed_output predicate (used earlier in the loop) to build a
filtered list of outputs (e.g., persisted_outputs = [item for item in
latest_response_object.output if is_server_deployed_output(item)]) and pass that
filtered list into append_turn_items_to_conversation when invoking it via
AsyncLlamaStackClientHolder().get_client().

In `@src/utils/responses.py`:
- Around line 1163-1171: The function that determines server-side items
currently checks item_type against ("mcp_call", "mcp_list_tools",
"mcp_approval_request") but omits "mcp_approval_response", causing responses to
fall through to the default True; update the conditional in
src/utils/responses.py (the branch using item_type, server_label,
configuration.mcp_servers, and output_item) to also handle
"mcp_approval_response" (or add an explicit branch for mcp_approval_response) so
the code checks server_label ownership for approval responses before the final
return True fallback.

In `@tests/unit/utils/test_responses.py`:
- Around line 2840-2860: Add a regression test covering the case tools=None with
tool_choice="none": copy or extend the existing async
test_no_tools_uses_prepare_tools and call resolve_tool_choice with tools=None
and tool_choice="none" (keeping the same mocks for AsyncLlamaStackClientHolder
and prepare_tools), then assert that prepare_tools was awaited
(mock_prepare.assert_called_once()) and that the returned tools list matches the
mocked server_tool; this ensures resolve_tool_choice properly sets no_tools and
passes it through into prepare_tools.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2c6dc5e3-b90c-4e1e-999d-b9d851150345

📥 Commits

Reviewing files that changed from the base of the PR and between 2d6fea6 and d0056d5.

📒 Files selected for processing (6)
  • docs/goose-example-tool-merging.md
  • src/app/endpoints/responses.py
  • src/utils/responses.py
  • tests/integration/endpoints/test_query_integration.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E: server mode / ci
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

  • tests/integration/endpoints/test_query_integration.py
  • tests/unit/app/endpoints/test_responses.py
  • src/app/endpoints/responses.py
  • src/utils/responses.py
  • tests/unit/utils/test_responses.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

  • tests/integration/endpoints/test_query_integration.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use FastAPI imports from the fastapi module: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack client imports
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntax str | int instead of Union[str, int] for type annotations
Use Optional[Type] for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for functions performing I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with @abstractmethod decorators for interface implementations
Use complete type annotations for all class attributes; avoid using Any type
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed

Files:

  • src/app/endpoints/responses.py
  • src/utils/responses.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/responses.py
🧠 Learnings (2)
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
🪛 markdownlint-cli2 (0.22.0)
docs/goose-example-tool-merging.md

[warning] 14-14: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (2)
src/utils/responses.py (2)

1397-1424: Conflict handling here looks good.

Failing fast on duplicate MCP server_label and duplicate file_search avoids silent shadowing and gives callers a deterministic 409.


1496-1503: Good fix for tool_choice="none" without explicit tools.

Passing no_tools into the fallback prepare_tools() call preserves the expected "no tools" behavior when the request omits tools.

Comment on lines +82 to +84
1. **`response.output_item.added`** -- If the item is an `mcp_call`/`mcp_list_tools`/`mcp_approval_request` with a `server_label` matching a configured MCP server, the event is dropped and the `output_index` is tracked.
2. **`response.mcp_call.*` / `response.mcp_list_tools.*`** -- Events with a tracked `output_index` are dropped.
3. **`response.output_item.done`** -- Matching items are dropped and the index tracking is cleared.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document response.mcp_approval_request.* filtering as well.

Bullet 2 still lists only response.mcp_call.* and response.mcp_list_tools.*, but the implementation now drops response.mcp_approval_request.* events too. As written, this understates the SSE contract for Goose-style clients.

Proposed fix
-2. **`response.mcp_call.*` / `response.mcp_list_tools.*`** -- Events with a tracked `output_index` are dropped.
+2. **`response.mcp_call.*` / `response.mcp_list_tools.*` / `response.mcp_approval_request.*`** -- Events with a tracked `output_index` are dropped.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
1. **`response.output_item.added`** -- If the item is an `mcp_call`/`mcp_list_tools`/`mcp_approval_request` with a `server_label` matching a configured MCP server, the event is dropped and the `output_index` is tracked.
2. **`response.mcp_call.*` / `response.mcp_list_tools.*`** -- Events with a tracked `output_index` are dropped.
3. **`response.output_item.done`** -- Matching items are dropped and the index tracking is cleared.
1. **`response.output_item.added`** -- If the item is an `mcp_call`/`mcp_list_tools`/`mcp_approval_request` with a `server_label` matching a configured MCP server, the event is dropped and the `output_index` is tracked.
2. **`response.mcp_call.*` / `response.mcp_list_tools.*` / `response.mcp_approval_request.*`** -- Events with a tracked `output_index` are dropped.
3. **`response.output_item.done`** -- Matching items are dropped and the index tracking is cleared.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/goose-example-tool-merging.md` around lines 82 - 84, Update the docs to
state that events matching tracked output_index are dropped for approval
requests as well: in the list replace or extend bullet 2 to include
response.mcp_approval_request.* alongside response.mcp_call.* and
response.mcp_list_tools.*, and ensure the surrounding bullets referencing
response.output_item.added and response.output_item.done still describe
tracking/clearing of the output_index for server_label-matched MCP items.

Comment on lines +465 to +528
def _is_server_mcp_output_item(
item: dict[str, Any], configured_mcp_labels: set[str]
) -> bool:
"""Check if a serialized output item is a server-deployed MCP tool call.

Args:
item: A dict from the serialized response output array.
configured_mcp_labels: Set of server_label names configured in LCS.

Returns:
True if the item is an MCP call/list/approval from a server-deployed MCP server.
"""
item_type = item.get("type")
if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
return item.get("server_label") in configured_mcp_labels
return False


def _should_filter_mcp_chunk(
chunk: OpenAIResponseObjectStream,
event_type: Optional[str],
configured_mcp_labels: set[str],
server_mcp_output_indices: set[int],
) -> bool:
"""Check if a streaming chunk is a server-deployed MCP event that should be filtered.

Args:
chunk: The streaming chunk to check.
event_type: The event type of the chunk.
configured_mcp_labels: Set of server_label names configured in LCS.
server_mcp_output_indices: Tracked output indices of server-deployed MCP calls.

Returns:
True if the chunk should be filtered out from the client stream.
"""
if event_type == "response.output_item.added":
item_added_chunk = cast(OutputItemAddedChunk, chunk)
item = item_added_chunk.item
item_type = getattr(item, "type", None)
if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
server_label = getattr(item, "server_label", None)
if server_label in configured_mcp_labels:
server_mcp_output_indices.add(item_added_chunk.output_index)
return True

if event_type and (
event_type.startswith("response.mcp_call.")
or event_type.startswith("response.mcp_list_tools.")
or event_type.startswith("response.mcp_approval_request.")
):
output_index = getattr(chunk, "output_index", None)
if output_index in server_mcp_output_indices:
return True

if event_type == "response.output_item.done":
item_done_chunk = cast(OutputItemDoneChunk, chunk)
item = item_done_chunk.item
item_type = getattr(item, "type", None)
if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
if item_done_chunk.output_index in server_mcp_output_indices:
server_mcp_output_indices.discard(item_done_chunk.output_index)
return True

return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align MCP stream filtering with is_server_deployed_output().

_finalize_response() on Line 557 uses is_server_deployed_output(), which already treats unlabeled mcp_call items as server-deployed. This helper and the response.output_item.added tracking branch only accept label-matched MCP items, so Line 644 leaves an unlabeled server mcp_call in response.completed and its substream events are never tracked or filtered.

Proposed fix
 def _is_server_mcp_output_item(
     item: dict[str, Any], configured_mcp_labels: set[str]
 ) -> bool:
     """Check if a serialized output item is a server-deployed MCP tool call."""
     item_type = item.get("type")
+    if item_type == "mcp_call" and item.get("server_label") is None:
+        return True
     if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
         return item.get("server_label") in configured_mcp_labels
     return False
@@
     if event_type == "response.output_item.added":
         item_added_chunk = cast(OutputItemAddedChunk, chunk)
         item = item_added_chunk.item
         item_type = getattr(item, "type", None)
         if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
             server_label = getattr(item, "server_label", None)
+            if item_type == "mcp_call" and server_label is None:
+                server_mcp_output_indices.add(item_added_chunk.output_index)
+                return True
             if server_label in configured_mcp_labels:
                 server_mcp_output_indices.add(item_added_chunk.output_index)
                 return True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 465 - 528, The MCP filtering
helpers currently only treat labeled MCP items as server-deployed, but
is_server_deployed_output() considers unlabeled "mcp_call" as server-deployed;
update _is_server_mcp_output_item and _should_filter_mcp_chunk to mirror that
logic by treating an MCP item with no server_label (None/absent) as
server-deployed as well as those whose server_label is in configured_mcp_labels.
Concretely, change the label checks in _is_server_mcp_output_item, the
response.output_item.added branch in _should_filter_mcp_chunk (the
item_type/server_label check and adding to server_mcp_output_indices), and the
response.output_item.done branch to also accept server_label is None for
"mcp_call" (and the other MCP types if intended) so unlabeled server-deployed
MCP streams are tracked and filtered.

Comment on lines 1473 to +1489
vector_store_ids = extract_vector_store_ids_from_tools(tools)
# Translate user-facing rag_ids to llama-stack vector_store_ids in each file_search tool
byok_rags = configuration.configuration.byok_rag
prepared_tools = translate_tools_vector_store_ids(tools, byok_rags)

# Optionally merge server-configured tools (RAG, MCP) with client tools
if merge_server_tools:
server_tools = await prepare_tools(
client=client,
vector_store_ids=vector_store_ids,
no_tools=no_tools,
token=token,
mcp_headers=mcp_headers,
request_headers=request_headers,
)
if server_tools:
prepared_tools = _merge_tools(prepared_tools, server_tools)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't suppress server RAG when the client has no file_search.

At Line 1473, extract_vector_store_ids_from_tools() returns [] if the request only contains function/MCP tools. Passing that empty list into prepare_tools() at Line 1482 is treated as an explicit override, so no server file_search is prepared. In practice, merged requests never receive server-configured RAG.

🩹 Minimal fix
         vector_store_ids = extract_vector_store_ids_from_tools(tools)
+        client_has_file_search = any(tool.type == "file_search" for tool in tools)
         # Translate user-facing rag_ids to llama-stack vector_store_ids in each file_search tool
         byok_rags = configuration.configuration.byok_rag
         prepared_tools = translate_tools_vector_store_ids(tools, byok_rags)

         # Optionally merge server-configured tools (RAG, MCP) with client tools
         if merge_server_tools:
             server_tools = await prepare_tools(
                 client=client,
-                vector_store_ids=vector_store_ids,
+                vector_store_ids=vector_store_ids if client_has_file_search else None,
                 no_tools=no_tools,
                 token=token,
                 mcp_headers=mcp_headers,
                 request_headers=request_headers,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 1473 - 1489, The code passes an empty
vector_store_ids (from extract_vector_store_ids_from_tools) into prepare_tools
which is treated as an explicit override and suppresses server RAG; change the
call so prepare_tools only receives vector_store_ids when the list is non-empty
(e.g., pass vector_store_ids=vector_store_ids if vector_store_ids else None or
omit the arg), leaving other args (client, no_tools, token, mcp_headers,
request_headers) unchanged; this ensures prepare_tools can still add server
file_search when the client provided none while preserving
translate_tools_vector_store_ids/prepared_tools and the subsequent _merge_tools
behavior.

await append_turn_items_to_conversation(
client, api_params.conversation, user_input, latest_response_object.output
)
# Extract response metadata and persist conversation turn
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this decomposition is valid, it is outside the scope of this PR. Please consider opening a separate PR for refactoring the handlers.


for tool in client_tools:
if isinstance(tool, InputToolMCP) and tool.server_label in server_mcp_labels:
raise HTTPException(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a dedicated model in models/responses.py inheriting from AbstractErrorResponse rather than passing plain detail.

),
)
if isinstance(tool, InputToolFileSearch) and has_server_file_search:
raise HTTPException(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well

return list(client_tools) + list(server_tools)


async def resolve_tool_choice(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase changes made to this function and adapt your changes to its new flow.

prepared_tools = translate_tools_vector_store_ids(tools, byok_rags)

# Optionally merge server-configured tools (RAG, MCP) with client tools
if merge_server_tools:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect to me. prepare_tools should be called every time tools are not explicitly specified by client. Making it conditional is not right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants