feat(agents): Add MiMo Code (mimo) ACP agent#679
Conversation
|
Reviewing for the v0.6 release — the agent registry change itself looks good, but a review (incl. a tracing of the integration-suite wiring) found 3 gaps in
Each is a small |
MiMo Code (Xiaomi) is an OpenCode fork that ships a native `mimo acp`
JSON-RPC stdio server (its initialize handshake reports agentInfo.name=
"OpenCode"), so it registers as a first-class ACP agent with a
registry-only change mirroring opencode:
- AGENTS["mimo"]: _js_agent_install("mimo", "@mimo-ai/cli@0.1.0") into the
isolated /opt/benchflow prefix, launch "mimo acp", acp_model_format=
provider/model, env_mapping maps BOTH base_url+api_key (codex-acp
precedent) so the non-proxy path needs no core edit.
- tests/integration/configs/mimo.yaml: defaults to mimo/mimo-auto, MiMo's
free no-account channel (works headless in-sandbox; benchflow skips the
LiteLLM proxy and sends the native model id via ACP set_model).
Validated on the symphony VM (daytona, v0.6.0rc6): registry invariants +
agent registry 171 passed, ruff/ty clean. Live rollouts on mimo-auto:
4/4 healthy, 2 solved (jax-computing-basics r=1.0/8 tools, threejs-to-obj
r=1.0/6 tools; data-to-d3 + weighted-gdp-calc healthy fails).
The committed `model: mimo/mimo-auto` does not run: - there is no `mimo` provider registered (only `xiaomi`), so the model resolves to no provider; and - `mimo-auto` is not a served model — the endpoint returns `400 "Not supported model mimo-auto"`. Use the existing `xiaomi` provider with a real model id; set XIAOMI_BASE_URL + XIAOMI_API_KEY to point at the MiMo endpoint. Validated live on Daytona (agent=mimo, weighted-gdp-calc), graded by the agent-judge integration suite. Findings: - benchflow wiring is correct: the proxy injects OPENAI_BASE_URL=<proxy> + OPENAI_API_KEY=<master_key> for the mimo agent (its env_mapping maps both base_url and api_key), and the agent-judge realness gate behaves correctly. - REMAINING BLOCKER (agent-side, not benchflow): @mimo-ai/cli@0.1.0 completes the ACP handshake and accepts the model, then ends the turn with zero model requests (total_requests=0, 0 tool calls, empty agent log) — so every rollout is empty and the realness gate rejects it. The MiMo Code CLI needs its model resolution debugged before this suite can pass; this config fix is necessary but not sufficient.
Review follow-ups for #679 (thanks @xdotli), rebased onto latest main: - BLOCKER: non-proxy modelId was mangled — for --model xiaomi/mimo-v2.5-pro, strip_provider_prefix removed the registered xiaomi/ prefix and the models.dev heuristics fell through to anthropic/mimo-v2.5-pro (nonexistent route). Added the ("mimo","xiaomi") heuristic so the id passes through as xiaomi/<model> — the MiMo CLI catalog form. Proxy path untouched (openai/benchflow-<alias>); both shapes pinned in test_litellm_hardening. - run.sh: the 3 review gaps — mimo in ALL_AGENTS, model_for_agent emits xiaomi/mimo-v2.5-pro verbatim (check_results compares by string equality), has_creds_for gates on XIAOMI_API_KEY + XIAOMI_BASE_URL (both required; resolve_agent_env fails closed without either). - Conventional per-agent tests: mimo web-policy setup_cmd writes .config/mimocode/mimocode.json with tools.webfetch=false (+ snippet-list coverage in test_internet_policy). - docs/integration-tests.md: agent count 8→9 + XIAOMI creds row; registry comment documents both modelId shapes.
…points
MiMo Code pins its native models.dev xiaomi provider to the standard
platform endpoint, so token-plan/regional keys (XIAOMI_BASE_URL) fail
with "Invalid API Key". Ship the override through the existing
CredentialFile hook: when XIAOMI_API_KEY is present, write
~/.config/mimocode/mimocode.json with provider.xiaomi.options pointing
at {env:XIAOMI_BASE_URL}/{env:XIAOMI_API_KEY} — the CLI resolves the
env references at runtime, so no secret is materialized in the file.
mimo.yaml runs with usage_tracking off: the CLI validates model ids
against its own catalog and rejects LiteLLM proxy aliases
(openai/benchflow-*), so the eval must send the native xiaomi/<model>
id directly (kept intact by the ("mimo","xiaomi") heuristic).
Live-verified on Daytona (citation-check, legacy layout):
- mimo + xiaomi/mimo-v2.5-pro: PASS reward=1.0, 12 tool calls
- control openhands + deepseek-v4-flash: PASS reward=1.0, 27 tool calls
|
@xdotli all three gaps closed, plus two deeper blockers your model-spec fix surfaced — and the lane is now live-verified end to end. Also rebased onto latest Your 3 run.sh gaps (
Blocker 1 — non-proxy modelId mangling ( Blocker 2 — fixed-endpoint auth ( Live evidence (Daytona, citation-check, legacy layout):
Also added the conventional per-agent tests (web-policy setup_cmd writes One heads-up beyond this PR: skillsbench upstream just converted all tasks to native task.md ( |
Summary
Adds
mimo(Xiaomi MiMo Code, npm@mimo-ai/cli) as a first-class ACP agent, layered on the v0.6.0 agent stack (#665).MiMo Code is an OpenCode fork that ships a native
mimo acpJSON-RPC stdio server — itsinitializehandshake reportsagentInfo.name="OpenCode"— so this lands as a registry-only change mirroring the existingopencodeentry, exactly as the registry docstring prescribes ("adding a new agent is a registry-only change"). No core edits, no shim, noif agent == "mimo"special cases.Changes
src/benchflow/agents/registry.py—AGENTS["mimo"]:install_cmd=_js_agent_install("mimo", "@mimo-ai/cli@0.1.0")(pinned; isolated/opt/benchflownode prefix)launch_cmd=_js_agent_launch("mimo", "acp"),protocol="acp",acp_model_format="provider/model"(models.dev ids, OpenCode lineage)env_mappingmaps bothBENCHFLOW_PROVIDER_BASE_URL→OPENAI_BASE_URLandBENCHFLOW_PROVIDER_API_KEY→OPENAI_API_KEY(codex-acp precedent) so the non-proxy path needs no core edit~/.config/mimocode/mimocode.json(tools.webfetch=false, opencode precedent)tests/integration/configs/mimo.yaml— integration config; defaults tomimo/mimo-auto, MiMo's free no-account channel (works headless in-sandbox; the runtime skips the LiteLLM proxy and delivers the native model id via ACPset_model)Evidence (live on a GCP VM, Daytona sandbox, v0.6.0rc6)
tests/test_registry_invariants.py+tests/test_agent_registry.py: 171 passed (the new entry is auto-covered by the executable registry schema, incl. the JS-isolation invariants)ruff check/ruff format --check/ty check: clean; generatedinstall_cmdpassesdash -ninitialize → protocolVersion 1, agentInfo "OpenCode"mimo/mimo-auto(SkillsBench task.md tasks): 4/4 healthy (tools>0, tokens>0, reward non-None), 2 solved —jax-computing-basicsr=1.0 (8 tools, 26k tok),threejs-to-objr=1.0 (6 tools);data-to-d3,weighted-gdp-calchealthy failsNotes for reviewers
mimo-autochannel needs no credentials; for MiMo's flagship models,xiaomi/mimo-v2.5-proroutes through the existingxiaomiprovider (XIAOMI_API_KEY/XIAOMI_BASE_URL).openai/benchflow-…) are rejected by the opencode family's models.dev validation (ProviderModelNotFoundError) — this affectsopencodeidentically and predates this PR; the integration config therefore defaults to the native channel.How to test