Skip to content

feat(agents): Add MiMo Code (mimo) ACP agent#679

Open
Yiminnn wants to merge 4 commits into
mainfrom
feat/mimo-agent
Open

feat(agents): Add MiMo Code (mimo) ACP agent#679
Yiminnn wants to merge 4 commits into
mainfrom
feat/mimo-agent

Conversation

@Yiminnn

@Yiminnn Yiminnn commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds mimo (Xiaomi MiMo Code, npm @mimo-ai/cli) as a first-class ACP agent, layered on the v0.6.0 agent stack (#665).

MiMo Code is an OpenCode fork that ships a native mimo acp JSON-RPC stdio server — its initialize handshake reports agentInfo.name="OpenCode" — so this lands as a registry-only change mirroring the existing opencode entry, exactly as the registry docstring prescribes ("adding a new agent is a registry-only change"). No core edits, no shim, no if agent == "mimo" special cases.

Changes

  • src/benchflow/agents/registry.pyAGENTS["mimo"]:
    • install_cmd=_js_agent_install("mimo", "@mimo-ai/cli@0.1.0") (pinned; isolated /opt/benchflow node prefix)
    • launch_cmd=_js_agent_launch("mimo", "acp"), protocol="acp", acp_model_format="provider/model" (models.dev ids, OpenCode lineage)
    • env_mapping maps both BENCHFLOW_PROVIDER_BASE_URL→OPENAI_BASE_URL and BENCHFLOW_PROVIDER_API_KEY→OPENAI_API_KEY (codex-acp precedent) so the non-proxy path needs no core edit
    • web tools disabled via ~/.config/mimocode/mimocode.json (tools.webfetch=false, opencode precedent)
  • tests/integration/configs/mimo.yaml — integration config; defaults to mimo/mimo-auto, MiMo's free no-account channel (works headless in-sandbox; the runtime skips the LiteLLM proxy and delivers the native model id via ACP set_model)

Evidence (live on a GCP VM, Daytona sandbox, v0.6.0rc6)

  • tests/test_registry_invariants.py + tests/test_agent_registry.py: 171 passed (the new entry is auto-covered by the executable registry schema, incl. the JS-isolation invariants)
  • ruff check / ruff format --check / ty check: clean; generated install_cmd passes dash -n
  • Stdio ACP handshake verified: initialize → protocolVersion 1, agentInfo "OpenCode"
  • Live rollouts on mimo/mimo-auto (SkillsBench task.md tasks): 4/4 healthy (tools>0, tokens>0, reward non-None), 2 solved — jax-computing-basics r=1.0 (8 tools, 26k tok), threejs-to-obj r=1.0 (6 tools); data-to-d3, weighted-gdp-calc healthy fails

Notes for reviewers

  • The free mimo-auto channel needs no credentials; for MiMo's flagship models, xiaomi/mimo-v2.5-pro routes through the existing xiaomi provider (XIAOMI_API_KEY/XIAOMI_BASE_URL).
  • Proxy-alias models (e.g. openai/benchflow-…) are rejected by the opencode family's models.dev validation (ProviderModelNotFoundError) — this affects opencode identically and predates this PR; the integration config therefore defaults to the native channel.

How to test

uv run python -m pytest tests/test_registry_invariants.py tests/test_agent_registry.py -q
uv run bench agent show mimo
uv run bench eval create --tasks-dir <tasks> --include jax-computing-basics \
  --agent mimo --model mimo/mimo-auto --sandbox daytona --skill-mode no-skill \
  --concurrency 1 --jobs-dir jobs/mimo-smoke

@xdotli

xdotli commented Jun 12, 2026

Copy link
Copy Markdown
Member

Reviewing for the v0.6 release — the agent registry change itself looks good, but a review (incl. a tracing of the integration-suite wiring) found 3 gaps in tests/integration/run.sh that would keep mimo from actually running in the default suite and cause a config-audit mismatch. Flagging so this can land cleanly:

  1. mimo missing from ALL_AGENTS (run.sh:49-58) — a no-arg run.sh never runs the new agent, so the shipped mimo.yaml is dead config in the normal suite path.
  2. model_for_agent has no mimo case (run.sh:41-47 + check_results.py:728-730) — run.sh ignores the YAML model: and falls to the default gemini-3.1-flash-lite-preview, while config.json records xiaomi/mimo-v2.5-pro from the YAML → guaranteed model-mismatch audit failure.
  3. has_creds_for has no mimo case (run.sh:95-109) — it falls to has_gemini_key/GEMINI_API_KEY, but mimo needs XIAOMI_API_KEY/XIAOMI_BASE_URL (per the YAML comment), so the suite green-lights launching mimo with the wrong credential gate and the eval then fails closed in resolve_agent_env.

Each is a small run.sh/model_for_agent/has_creds_for addition. Happy to help wire these if useful — leaving the PR to you since it's yours.

Yiminnn and others added 3 commits June 12, 2026 23:47
MiMo Code (Xiaomi) is an OpenCode fork that ships a native `mimo acp`
JSON-RPC stdio server (its initialize handshake reports agentInfo.name=
"OpenCode"), so it registers as a first-class ACP agent with a
registry-only change mirroring opencode:

- AGENTS["mimo"]: _js_agent_install("mimo", "@mimo-ai/cli@0.1.0") into the
  isolated /opt/benchflow prefix, launch "mimo acp", acp_model_format=
  provider/model, env_mapping maps BOTH base_url+api_key (codex-acp
  precedent) so the non-proxy path needs no core edit.
- tests/integration/configs/mimo.yaml: defaults to mimo/mimo-auto, MiMo's
  free no-account channel (works headless in-sandbox; benchflow skips the
  LiteLLM proxy and sends the native model id via ACP set_model).

Validated on the symphony VM (daytona, v0.6.0rc6): registry invariants +
agent registry 171 passed, ruff/ty clean. Live rollouts on mimo-auto:
4/4 healthy, 2 solved (jax-computing-basics r=1.0/8 tools, threejs-to-obj
r=1.0/6 tools; data-to-d3 + weighted-gdp-calc healthy fails).
The committed `model: mimo/mimo-auto` does not run:
- there is no `mimo` provider registered (only `xiaomi`), so the model
  resolves to no provider; and
- `mimo-auto` is not a served model — the endpoint returns
  `400 "Not supported model mimo-auto"`.

Use the existing `xiaomi` provider with a real model id; set
XIAOMI_BASE_URL + XIAOMI_API_KEY to point at the MiMo endpoint.

Validated live on Daytona (agent=mimo, weighted-gdp-calc), graded by the
agent-judge integration suite. Findings:
- benchflow wiring is correct: the proxy injects OPENAI_BASE_URL=<proxy> +
  OPENAI_API_KEY=<master_key> for the mimo agent (its env_mapping maps both
  base_url and api_key), and the agent-judge realness gate behaves correctly.
- REMAINING BLOCKER (agent-side, not benchflow): @mimo-ai/cli@0.1.0 completes
  the ACP handshake and accepts the model, then ends the turn with zero model
  requests (total_requests=0, 0 tool calls, empty agent log) — so every rollout
  is empty and the realness gate rejects it. The MiMo Code CLI needs its model
  resolution debugged before this suite can pass; this config fix is necessary
  but not sufficient.
Review follow-ups for #679 (thanks @xdotli), rebased onto latest main:

- BLOCKER: non-proxy modelId was mangled — for --model xiaomi/mimo-v2.5-pro,
  strip_provider_prefix removed the registered xiaomi/ prefix and the
  models.dev heuristics fell through to anthropic/mimo-v2.5-pro (nonexistent
  route). Added the ("mimo","xiaomi") heuristic so the id passes through as
  xiaomi/<model> — the MiMo CLI catalog form. Proxy path untouched
  (openai/benchflow-<alias>); both shapes pinned in test_litellm_hardening.
- run.sh: the 3 review gaps — mimo in ALL_AGENTS, model_for_agent emits
  xiaomi/mimo-v2.5-pro verbatim (check_results compares by string equality),
  has_creds_for gates on XIAOMI_API_KEY + XIAOMI_BASE_URL (both required;
  resolve_agent_env fails closed without either).
- Conventional per-agent tests: mimo web-policy setup_cmd writes
  .config/mimocode/mimocode.json with tools.webfetch=false (+ snippet-list
  coverage in test_internet_policy).
- docs/integration-tests.md: agent count 8→9 + XIAOMI creds row; registry
  comment documents both modelId shapes.
@Yiminnn Yiminnn changed the base branch from release/v0.6.0 to main June 12, 2026 23:53
…points

MiMo Code pins its native models.dev xiaomi provider to the standard
platform endpoint, so token-plan/regional keys (XIAOMI_BASE_URL) fail
with "Invalid API Key". Ship the override through the existing
CredentialFile hook: when XIAOMI_API_KEY is present, write
~/.config/mimocode/mimocode.json with provider.xiaomi.options pointing
at {env:XIAOMI_BASE_URL}/{env:XIAOMI_API_KEY} — the CLI resolves the
env references at runtime, so no secret is materialized in the file.

mimo.yaml runs with usage_tracking off: the CLI validates model ids
against its own catalog and rejects LiteLLM proxy aliases
(openai/benchflow-*), so the eval must send the native xiaomi/<model>
id directly (kept intact by the ("mimo","xiaomi") heuristic).

Live-verified on Daytona (citation-check, legacy layout):
- mimo + xiaomi/mimo-v2.5-pro: PASS reward=1.0, 12 tool calls
- control openhands + deepseek-v4-flash: PASS reward=1.0, 27 tool calls
@Yiminnn

Yiminnn commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator Author

@xdotli all three gaps closed, plus two deeper blockers your model-spec fix surfaced — and the lane is now live-verified end to end. Also rebased onto latest main (001da3e) and retargeted the PR there, so the real CI gate runs.

Your 3 run.sh gaps (ea92d688):

  • mimo added to ALL_AGENTS
  • model_for_agent emits xiaomi/mimo-v2.5-pro verbatim (check_results compares by string equality)
  • has_creds_for gates on XIAOMI_API_KEY and XIAOMI_BASE_URL (both required — resolve_agent_env fails closed without either)

Blocker 1 — non-proxy modelId mangling (ea92d688): strip_provider_prefix removed the registered xiaomi/ prefix and the models.dev heuristics fell through to anthropic/mimo-v2.5-pro (nonexistent route, silent except a log warning). Fixed with a ("mimo","xiaomi") heuristic entry; both shapes pinned in test_litellm_hardening (xiaomi/... passes through; proxy aliases still format to openai/benchflow-*).

Blocker 2 — fixed-endpoint auth (f51d340c): the CLI pins its native xiaomi provider to the standard platform endpoint, so token-plan/regional keys fail with "Invalid API Key". Shipped via the existing CredentialFile hook: when XIAOMI_API_KEY is set, benchflow writes ~/.config/mimocode/mimocode.json overriding provider.xiaomi.options with {env:XIAOMI_BASE_URL}/{env:XIAOMI_API_KEY} references (resolved by the CLI at runtime — no secret materialized in the file). mimo.yaml runs with usage_tracking: "off" since the CLI rejects LiteLLM proxy aliases (openai/benchflow-*) against its catalog — the native id must go through directly.

Live evidence (Daytona, citation-check, legacy layout):

  • mimo + xiaomi/mimo-v2.5-pro: PASS, reward 1.0, 12 tool calls (model id arrives intact, token-plan endpoint authenticated)
  • control openhands + deepseek/deepseek-v4-flash: PASS, reward 1.0, 27 tool calls

Also added the conventional per-agent tests (web-policy setup_cmd writes mimocode.json tools.webfetch=false + snippet coverage) and the docs updates (integration-tests.md 8→9 agents + creds row).

One heads-up beyond this PR: skillsbench upstream just converted all tasks to native task.md (64395c1, skillsbench@1.2 draft) — main's legacy-only discovery can't see those packages, which makes landing the v0.6 task.md line more pressing. The live proofs above used the pre-migration legacy layout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants