UI Automata

The AI Toolkit for Windows Desktop Automation

Website | Download | Docs

UI Automata lets an AI agent write, run, and debug automation workflows across any Windows application (Win32, WPF, WinForms, WinUI, UWP) and Edge-based browsers, in the same workflow file.

We at Vision Cortex built it to help clients in industrial design automate CAD/CAM tasks: multi-step workflows in desktop applications that have no API, require precise timing, and need to handle popups and error dialogs without failing silently. We're open-sourcing it so other teams can do the same.

The problem

Windows UI work is exactly what AI agents should handle. But every step can fail in ways a script cannot see:

Timing: a button click completes before the app finishes processing. The UI looks ready; it is not.
Transient disabled state: the element exists and is visible, but temporarily disabled. The click fires, nothing happens, no error is returned.
Popup interruptions: a modal dialog captures focus mid-step. The script waits for an outcome that will never come.
Stale handles: the app rebuilds its UI after a navigation. Cached references point to the wrong place; clicks land silently on the wrong target.
Focus loss: a keypress meant for one field lands on another.

These are not edge cases: they are routine in any real Windows application.

Quick Demo

Automata.Demo.mp4

Install Python on this machine from the Microsoft Store, not python.org. Pick the latest 3.x version.

The agent opens the Store via its app URI, searches for Python, reads the result cards to identify 3.13, clicks Get, and polls until installation completes.

Grab the latest installer from the official site, run it silently, then open Git Bash and confirm it works.

For Git, the agent navigates to gitforwindows.org in Edge, triggers the download using UIA (CDP synthetic clicks are blocked for file downloads), waits for the installer to finish via the Downloads panel, runs it silently with UAC confirmation, then falls back to vision OCR to read the Git Bash terminal output where UIA has no coverage.

Installation

PowerShell -ExecutionPolicy Bypass -Command "iwr https://raw.githubusercontent.com/visioncortex/ui-automata/refs/heads/main/install/install-windows.ps1 | iex"

Use cases

UI Automata is designed to be driven by an AI agent through its MCP server. Here are some example prompts:

Automate a desktop application

mastercam-demo.mp4

Use the ui-automata skill. Open Mastercam, load the file at C:\Projects\part.mcam, open the simulator window, and export the result csv to C:\output.

The agent walks the element tree, tests selectors live, writes the workflow YAML, and runs it (all in one session). You provide the intent and review the result.

Fix a broken workflow

Our workflow that enters invoices into AccountMate is failing with a stale handle error after the confirmation dialog closes. Here's the error trace. Fix it.

The agent replays the workflow, pauses at the failure, inspects the live element tree, and adjusts the anchor or selector to handle the UI rebuild.

Automate across desktop and browser

Download this month's supplier invoices from our vendor portal in Edge, then open our accounting desktop app and enter each one. The portal URL is...

Workflows can mix browser steps (CDP-powered, structured DOM access) with desktop app steps in a single file. No need to stitch together separate tools.

How it works

Every step declares an action, an expect condition, and an optional recovery handler:

- intent: click the Open button
  action:
    type: Click
    scope: main_window
    selector: ">> [role=button][name=Open]"
  expect:
    type: DialogPresent
    scope: main_window
  timeout: 10s

The engine runs the same lifecycle for every step:

Execute the action
Poll expect every 100ms
Condition passes → advance. Timeout → run recovery handler, then retry, skip, or fail.

No sleeps. No hardcoded waits. Recovery handlers are declared once and fire wherever their trigger condition is met: a dialog appearing mid-workflow, a confirmation prompt, a progress bar that needs to clear.

notepad-demo.mp4

See the notepad demo workflow for a complete example with phases, anchors, recovery handlers, and flow control.

Selectors

CSS-like paths over Windows UI Automation properties:

>> [role=edit][name='File name:']           # descendant edit field
>  [role=button][name^=Don][name$=Save]     # direct child: "Don't Save"
>> [role=list item]:nth(0)                  # first list item
>> [role=list item][name~=Wing]:parent      # parent of matching item
>> [id=SettingsPageAbout_New]               # by AutomationId (locale-stable)
>> [role=button|menu item]                  # OR: matches either role

Works across Win32, WPF, WinForms, WinUI, and UWP. → Full reference

The shadow DOM

Windows UI Automation is a cross-process RPC protocol: every element query is a round-trip to the target process. Walk a path of nested elements and each level is a separate cross-process call. Traditional automation tools pay this cost on every step; a 20-step workflow makes 20+ round-trips, re-discovering structure it already found the step before.

UI Automata's answer is the shadow DOM: a cached mirror of the live element tree. Handles are resolved once and reused for every subsequent step (a cached lookup is effectively free compared to a live UIA query). Think of it as the inverse of React's virtual DOM: React maintains a virtual tree to efficiently write to a UI it controls; the shadow DOM maintains one to efficiently read from a UI it does not.

→ How the shadow DOM works

Agent tools

The included MCP server (automata-agent) gives an AI agent direct access to the Windows desktop:

desktop — list windows, walk the UIA element tree, test selectors live
vision — OCR and visual layout capture for apps with incomplete UIA support
app — launch apps, list installed apps, manage windows via the taskbar
window — minimize, maximize, restore, reposition, or screenshot a window by HWND
run_actions — run ad-hoc UI automation steps without a workflow file
start_workflow — run a named workflow and stream per-phase progress
workflow — list workflows, check status, cancel runs, browse run history, lint YAML
input — raw mouse and keyboard input, works on any window regardless of UIA support
clipboard — read or write the Windows clipboard
browser — control Edge via CDP: navigate, evaluate JavaScript, read the DOM
file — read, write, copy, move, delete, glob, stat
system — shell execution, process management, system diagnostics
resources — browse the embedded workflow library

Compared to vision-based agents

Vision agents work by taking a screenshot and asking an inference model what to click next. UI Automata uses Windows UI Automation directly, with vision as a fallback (not the primary path):

	UI Automata	Vision agent
Approach	UIA elements + DOM query + vision	Screenshot only
Reliability	Deterministic — same selector works across runs	May vary across runs
Speed	Sub-second per step	Round-trip to inference API per step
Cost	Low — runs locally, no per-step inference	High — every step consumes tokens
Vision	On-device, used as fallback	Cloud inference, primary approach
Platform	Windows (all frameworks)	macOS-first, limited Windows
Model dependency	Any agent, any model	Locked to specific providers
Browser automation	CDP (structured page access)	Screenshot of browser
Trace	Structured log with full action detail	Sequence of screenshots

The two approaches are complementary: use UI Automata for deterministic, repeatable workflows; use vision when you need to handle unfamiliar apps or pages on the fly.

Community

Have a question or want to share what you've built? Join the conversation on GitHub Discussions.

Found a bug? Want a feature? Open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
automata-browser		automata-browser
automata-windows		automata-windows
install		install
ui-automata		ui-automata
workflows		workflows
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
workflow-schema.json		workflow-schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UI Automata

Website | Download | Docs

The problem

Quick Demo

Installation

Use cases

Automate a desktop application

Fix a broken workflow

Automate across desktop and browser

How it works

Selectors

The shadow DOM

Agent tools

Compared to vision-based agents

Community

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UI Automata

Website | Download | Docs

The problem

Quick Demo

Installation

Use cases

Automate a desktop application

Fix a broken workflow

Automate across desktop and browser

How it works

Selectors

The shadow DOM

Agent tools

Compared to vision-based agents

Community

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages