mini-computer-agent: minimal model-agnostic computer-use agent by xdotli · Pull Request #6 · benchflow-ai/agents

xdotli · 2026-06-13T17:50:03Z

mini-computer-agent

A minimal, model-agnostic computer-use agent — the computer-use analog of mini-swe-agent. ~175 lines, one dependency (litellm).

What it is

One loop: screenshot (scrot) → vision model (any, via litellm) → one JSON action → xdotool, repeat. No vendor computer-use tool, no separate grounding model, no tool-calling, and no benchflow/protocol code in the agent itself — it's a pure run callable. BenchFlow's generic ACP serve wraps it (separate benchflow change), so it needs no per-agent shim.

Load-bearing detail: coordinate scaling

Vision models emit coordinates normalized to [0,1000], not pixels. core.scale() maps them back (x_px = x/1000 · W). Without this, every click lands ~3× off and looks like the model "can't ground" — with it, grounding is solid. (e.g. 195/508/820 → 250/650/1050px.)

Validation (BenchFlow + Daytona, gemini-3.5-flash, ACP-served)

7 / 8 overall; single-target grounding 6 / 6.

Task	Reward
click-grounding (synthetic)	1.0
MiniWoB++ click-button / link / dialog / option / tab	1.0 each
MiniWoB++ enter-text	1.0 (after a 1-line focus-fix)
MiniWoB++ click-checkboxes	−0.33 (open: multi-select under-reading)

Demonstrated dogfood → fix → verify: enter-text flailed on field focus (13 steps); one line of prompt guidance (focus-then-type, verify-before-submit) → clean pass (7 steps).

Follow-ups

click-checkboxes multi-select thoroughness is the lone open task failure.
Consumed by BenchFlow via its generic ACP serve; the registry entry pip-installs this package.

Note

Low Risk
New isolated package and CI only; no changes to existing agents beyond an extra lint job step.

Overview
Introduces mini-computer-agent, a new standalone package that implements a minimal desktop GUI loop: screenshot → vision model (litellm) → one JSON action → xdotool, exposed as run(task, model, on_step=...). The agent stays protocol-free (BenchFlow wraps it elsewhere); it depends only on litellm plus system tools scrot / xdotool.

The core adds [0,1000] → pixel coordinate scaling before clicks (vision-model convention), tolerant JSON action parsing, and prompt guidance for focus-then-type / verify-before-done. Hermetic tests cover parsing, scaling, and PNG dimension reads without a model or sandbox.

CI is extended with ruff lint/format for the new package and a path-filtered pytest workflow (test-mini-computer-agent.yaml), aligned with existing mini-swe-agent workflows.

^{Reviewed by Cursor Bugbot for commit 902f16a. Bugbot is set up for automated code reviews on this repo. Configure here.}

… analog) Pure screenshot -> any vision model (litellm) -> one JSON action -> xdotool, in ~150 lines with no protocol/harness code. Coordinates are [0,1000]-normalized and scaled to pixels (raw use mis-clicks ~3x). Hermetic tests (parse/scale/PNG) + lint and pytest CI. Validated end-to-end on benchflow+Daytona with gemini-3.5-flash: 7/8 tasks, single-target grounding 6/6.

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 902f16a. Configure here.}

cursor · 2026-06-13T17:51:56Z

+    elif kind == "scroll":
+        dy = int(action.get("dy", 0))
+        for _ in range(max(1, abs(dy))):
+            _xdotool("click", "4" if dy < 0 else "5")


Zero scroll still scrolls down

Low Severity

In the scroll branch, max(1, abs(dy)) forces at least one wheel click when dy is 0 or omitted (defaults to 0), so a no-op scroll becomes one downward scroll. That contradicts the prompt’s signed dy semantics and can move the UI when the model intended no scrolling.

^{Reviewed by Cursor Bugbot for commit 902f16a. Configure here.}

cursor Bot reviewed Jun 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mini-computer-agent: minimal model-agnostic computer-use agent#6

mini-computer-agent: minimal model-agnostic computer-use agent#6
xdotli wants to merge 1 commit into
mainfrom
feat/mini-computer-acp

xdotli commented Jun 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xdotli commented Jun 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!