Skip to content

feat: MCP tool channel for composable TaskSet → Harness#1090

Open
hallerite wants to merge 4 commits intomainfrom
feat/mcp-tool-channel
Open

feat: MCP tool channel for composable TaskSet → Harness#1090
hallerite wants to merge 4 commits intomainfrom
feat/mcp-tool-channel

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented Apr 1, 2026

Summary

  • MCPServerSpec dataclass for declaring MCP servers (command, env_vars, files to upload)
  • TaskSet.get_mcp_servers() hook — tasks declare what tools they need
  • Harness.format_mcp_config() hook — each harness maps MCP specs to its agent's native config format
  • ComposableEnv wiring — uploads server files into sandbox, calls format_mcp_config at setup

This enables tasksets to bring their own tools via the MCP standard protocol, without requiring fork-specific agent modifications. The harness stays generic — it just connects whatever MCP servers the task provides.

Motivation

Environments like DeepDive and BrowseComp need task-specific tools (web search via Serper/Exa, web fetch). Currently these are baked into a custom OpenCode fork as built-in TypeScript tools. With this change, a DeepDive taskset can ship a Python MCP server wrapping the Serper API, and any harness (OpenCode, Claude Code, etc.) can connect to it.

Design

TaskSet                    ComposableEnv              Harness
  get_mcp_servers()  →  uploads files to sandbox  →  format_mcp_config()
  (what tools)           (wiring layer)              (agent-native config)
  • stdio transport inside the sandbox — the MCP server is a script that the agent spawns as a subprocess
  • Harness-agnostic — each harness implementation maps MCPServerSpec to its own config (OpenCode's mcp JSON section, Claude Code's --mcp-server flags, etc.)
  • No changes needed for existing tasksetsget_mcp_servers() defaults to empty dict

Next steps

  • Implement format_mcp_config() in the OpenCode harness (research-environments PR)
  • Port DeepDive as the first MCP-enabled taskset
  • Port BrowseComp

Test plan

  • All 15 existing composable env tests pass
  • Ruff check + format clean
  • E2E test with a DeepDive MCP taskset (followup PR)

🤖 Generated with Claude Code


Note

Medium Risk
Changes the ComposableEnv sandbox setup sequence to inject task-provided MCP server configuration, which can break existing harnesses if tasks start declaring MCP servers without a corresponding format_mcp_config implementation.

Overview
Adds a new TaskSet→Harness “MCP tool server” channel so tasks can declare MCP servers and have the environment pass them to the agent.

This introduces MCPServerSpec plus a TaskSet.get_mcp_servers() hook, extends Harness with mcp_servers and an overridable format_mcp_config() method, and updates ComposableEnv.post_sandbox_setup() to call the hook, require a harness config formatter, and upload the generated config to /task/mcp_config.json before installing the agent.

Written by Cursor Bugbot for commit 9b9d228. This will update automatically on new commits. Configure here.

f"TaskSet provides MCP servers {list(mcp_servers)} but "
f"harness {type(self.harness).__name__} does not implement "
f"format_mcp_config"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contradictory None semantics in format_mcp_config contract

Medium Severity

The format_mcp_config docstring documents None as a valid return value meaning "the harness handles MCP servers through other means (e.g. by regenerating its run_command)." However, ComposableEnv.post_sandbox_setup treats a None return as an error and raises NotImplementedError. A harness subclass that follows the documented contract and returns None to signal alternative handling will crash at runtime. The None return is overloaded with two incompatible meanings: "not implemented" (base class default) and "handled differently" (documented subclass use case).

Additional Locations (1)
Fix in Cursor Fix in Web

f"harness {type(self.harness).__name__} does not implement "
f"format_mcp_config"
)
await self.upload_content(sandbox_id, mcp_config, "/task/mcp_config.json")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing parent directory creation for MCP config upload

Medium Severity

The MCP config is uploaded to the hardcoded path /task/mcp_config.json without ensuring the /task directory exists. The existing code (step 2) only creates parent directories for instruction_path and system_prompt_path. If instruction_path is customized to a non-/task path (it's configurable on Harness), the /task directory won't be created, and the upload will fail. The /task path for MCP config needs to be included in the mkdir -p set, or have its own directory creation.

Fix in Cursor Fix in Web

"""

command: list[str]
env_vars: dict[str, str] | None = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation not updated for new MCP functionality

Low Severity

This PR adds user-facing MCPServerSpec, TaskSet.get_mcp_servers(), Harness.format_mcp_config(), and Harness.mcp_servers — all part of the composable architecture already documented in docs/environments.md (lines 889–892) and docs/reference.md. None of these additions are reflected in the documentation. Per project rules, PRs that add or modify core user-facing functionality described in docs/ must update the relevant documentation.

Additional Locations (1)
Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

hallerite and others added 4 commits April 1, 2026 21:16
TaskSets can now declare MCP servers via get_mcp_servers(), enabling
tasks to bring their own tools (e.g. web search, code execution)
without requiring fork-specific agent modifications.

- MCPServerSpec dataclass: command, env_vars, files
- TaskSet.get_mcp_servers() -> dict[str, MCPServerSpec]
- Harness.format_mcp_config() hook for agent-native config generation
- ComposableEnv uploads MCP server files and wires configs at setup

This enables future tasksets (DeepDive, BrowseComp) to ship MCP
servers that any harness can connect to via the standard protocol.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If a taskset declares MCP servers but the harness doesn't implement
format_mcp_config, fail hard instead of silently dropping tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The taskset handles getting server scripts into the sandbox via its
own setup() method, docker image, or install script. MCPServerSpec
only needs command and env_vars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hallerite hallerite force-pushed the feat/mcp-tool-channel branch from 8939667 to 9b9d228 Compare April 1, 2026 21:17
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

harness handles MCP servers through other means (e.g. by
regenerating its run_command).
"""
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflicting None semantics between docstring and caller

High Severity

The format_mcp_config docstring documents None as a valid return value meaning "the harness handles MCP servers through other means (e.g. by regenerating its run_command)." However, ComposableEnv.post_sandbox_setup treats any None return as an error and raises NotImplementedError. A harness subclass that intentionally returns None per the documented contract will crash. There's no way to distinguish "not overridden" from "handled through other means" since both return None.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant