11# SimulateInput
22
3- Cross-platform desktop and browser automation platform for testing your own websites, desktop applications, installers, and system-level UI flows.
3+ English | [ 中文 ] ( #中文 )
44
5- ## Features
5+ SimulateInput is a cross-platform desktop and browser automation platform for testing your own websites, desktop applications, installers, and system-level UI flows.
66
7- - Window attach, focus, click, drag, type, hotkey, clear text, and screenshot actions
8- - Multiple locator strategies: UIA/AX/AT-SPI style lookup, visible text, OCR, image matching, and coordinate fallback
9- - CLI, MCP, and YAML case runner interfaces
10- - Windows implementation with real smoke-tested execution
11- - macOS MVP, Linux X11 MVP, and Linux Wayland compatibility layer
12- - Skill docs for AI-driven automation workflows
7+ It combines direct input execution, multiple locator strategies, CLI and MCP interfaces, and YAML-driven reusable test cases so the same automation core can be used by engineers, CI pipelines, and AI agents.
138
14- ## Project Layout
9+ ## Highlights
1510
16- - ` src/simulateinput/ ` - core engine, drivers, CLI, MCP server, runner, and locators
17- - ` docs/automation-platform-design.md ` - architecture and implementation plan
18- - ` docs/cross-platform-installation.md ` - platform-specific setup and permissions
19- - ` skills/simulateinput/ ` - skill definition and MCP/CLI references
20- - ` tests/ ` - unit tests and smoke case YAML files
11+ - Cross-platform driver architecture for Windows, macOS, Linux X11, and Linux Wayland compatibility
12+ - Multiple locator strategies:
13+ - structured accessibility lookup
14+ - visible text lookup
15+ - OCR-based text lookup
16+ - image template matching
17+ - coordinate fallback
18+ - Real input actions:
19+ - click
20+ - drag
21+ - type text
22+ - press key
23+ - hotkey
24+ - clear text
25+ - screenshot
26+ - MCP server for AI tool calling
27+ - YAML case runner for repeatable automation flows
28+ - Skill definitions and references for AI-assisted execution
29+
30+ ## Current Platform Status
31+
32+ - Windows: primary implementation, real execution and smoke tested
33+ - macOS: MVP driver implemented, requires Accessibility / Automation / Screen Recording permissions
34+ - Linux X11: MVP driver implemented, depends on ` wmctrl ` , ` xdotool ` , screenshot helpers, and optional AT-SPI tooling
35+ - Linux Wayland: compatibility layer, helper-tool dependent and not yet full parity
36+
37+ ## Repository Structure
38+
39+ - ` src/simulateinput/ `
40+ - core engine
41+ - platform drivers
42+ - locators
43+ - CLI
44+ - MCP server
45+ - case runner
46+ - ` docs/automation-platform-design.md `
47+ - architecture and implementation plan
48+ - ` docs/cross-platform-installation.md `
49+ - platform setup, dependencies, and permissions
50+ - ` skills/simulateinput/ `
51+ - skill definition and CLI / MCP references
52+ - ` tests/ `
53+ - unit tests and smoke case YAML files
2154
2255## Quick Start
2356
@@ -28,7 +61,7 @@ python -m simulateinput.cli.main session start
2861python -m simulateinput.cli.main mcp tools
2962```
3063
31- ## Common CLI Flow
64+ ## Typical CLI Workflow
3265
3366``` powershell
3467$env:PYTHONPATH='src'
@@ -49,24 +82,21 @@ $env:PYTHONPATH='src'
4982python -m simulateinput.cli.main case run tests/e2e/cases/windows-smoke.yaml
5083```
5184
52- Example step types:
53-
54- - ` attach_window `
55- - ` locate_text `
56- - ` locate_uia `
57- - ` locate_ocr `
58- - ` locate_image `
59- - ` click_text `
60- - ` click_uia `
61- - ` click_ocr `
62- - ` click_image `
63- - ` click `
64- - ` drag `
65- - ` type_text `
66- - ` press_key `
67- - ` hotkey `
68- - ` clear_text `
69- - ` screenshot `
85+ Example case:
86+
87+ ``` yaml
88+ name : locator-smoke
89+ profile : lab_default
90+ steps :
91+ - action : attach_window
92+ title : Notepad
93+
94+ - action : locate_text
95+ text : File
96+
97+ - action : screenshot
98+ output : artifacts/locator-smoke.png
99+ ` ` `
70100
71101## MCP
72102
@@ -77,24 +107,183 @@ $env:PYTHONPATH='src'
77107python -m simulateinput.cli.main mcp serve
78108```
79109
80- Current MCP tools include session management, window attach, text/UIA/OCR/image lookup, click actions, keyboard actions, drag, and screenshot capture.
81-
82- ## Platform Status
110+ Current MCP capabilities include:
83111
84- - ` Windows ` - primary implementation, real execution and smoke tested
85- - ` macOS ` - MVP driver implemented, requires Accessibility / Automation / Screen Recording permissions
86- - ` Linux X11 ` - MVP driver implemented, depends on ` wmctrl ` , ` xdotool ` , and a screenshot helper
87- - ` Linux Wayland ` - compatibility layer, helper-tool dependent and not full parity
112+ - session management
113+ - window attach
114+ - structured locators
115+ - OCR and image locators
116+ - click and drag actions
117+ - keyboard actions
118+ - screenshot capture
88119
89- ## Installation Notes
120+ ## Installation
90121
91122See ` docs/cross-platform-installation.md ` for:
92123
93124- Python dependencies
94125- Tesseract OCR setup
95126- macOS permissions
96- - Linux X11 and Wayland helper packages
127+ - Linux helper packages
128+ - platform smoke cases
129+
130+ ## Documentation
131+
132+ - Architecture: ` docs/automation-platform-design.md `
133+ - Installation: ` docs/cross-platform-installation.md `
134+ - Skill: ` skills/simulateinput/SKILL.md `
135+ - CLI reference: ` skills/simulateinput/references/cli-usage.md `
136+ - MCP reference: ` skills/simulateinput/references/mcp-tools.md `
97137
98138## Safety Boundary
99139
100- This project is intended for automation of your own software, test environments, and explicitly authorized systems. It is not intended for bypassing third-party anti-bot controls or CAPTCHAs.
140+ SimulateInput is intended for automation of your own software, test environments, and explicitly authorized systems.
141+
142+ It is not intended for bypassing third-party anti-bot controls, CAPTCHAs, or unrelated security mechanisms.
143+
144+ ---
145+
146+ ## 中文
147+
148+ SimulateInput 是一个跨平台的桌面与浏览器自动化测试平台,用于测试你自己的网页、桌面软件、安装器以及系统级 UI 流程。
149+
150+ 它把真实输入执行、多种定位策略、CLI / MCP 接口和 YAML 可复用测试用例整合到同一个自动化核心中,既可以给工程师直接使用,也可以接入 CI 和 AI Agent。
151+
152+ ## 核心能力
153+
154+ - 跨平台驱动架构:Windows、macOS、Linux X11,以及 Linux Wayland 兼容层
155+ - 多种定位方式:
156+ - 结构化辅助功能 / 控件树定位
157+ - 可见文本定位
158+ - OCR 文本定位
159+ - 图像模板定位
160+ - 坐标兜底
161+ - 真实输入动作:
162+ - 点击
163+ - 拖拽
164+ - 文本输入
165+ - 单键输入
166+ - 组合键
167+ - 清空文本
168+ - 截图
169+ - MCP 服务,可供 AI 通过工具调用
170+ - YAML case runner,可执行可复用的自动化测试流程
171+ - 为 AI 使用准备的 skill 文档和参考资料
172+
173+ ## 当前平台状态
174+
175+ - Windows:主实现,已完成真实执行和 smoke test
176+ - macOS:已完成 MVP 驱动,实现依赖 Accessibility / Automation / Screen Recording 权限
177+ - Linux X11:已完成 MVP 驱动,依赖 ` wmctrl ` 、` xdotool ` 、截图工具和可选 AT-SPI 环境
178+ - Linux Wayland:当前是兼容层,依赖外部 helper,能力还未与 Windows 等价
179+
180+ ## 仓库结构
181+
182+ - ` src/simulateinput/ `
183+ - 核心引擎
184+ - 平台驱动
185+ - 定位器
186+ - CLI
187+ - MCP 服务
188+ - 用例运行器
189+ - ` docs/automation-platform-design.md `
190+ - 总体设计稿
191+ - ` docs/cross-platform-installation.md `
192+ - 跨平台安装、依赖和权限说明
193+ - ` skills/simulateinput/ `
194+ - AI skill 定义和 CLI / MCP 参考
195+ - ` tests/ `
196+ - 单元测试和 smoke case YAML
197+
198+ ## 快速开始
199+
200+ ``` powershell
201+ $env:PYTHONPATH='src'
202+ python -m simulateinput.cli.main doctor
203+ python -m simulateinput.cli.main session start
204+ python -m simulateinput.cli.main mcp tools
205+ ```
206+
207+ ## 常见 CLI 流程
208+
209+ ``` powershell
210+ $env:PYTHONPATH='src'
211+
212+ python -m simulateinput.cli.main session start
213+ python -m simulateinput.cli.main window list --session-id <session_id>
214+ python -m simulateinput.cli.main window attach --session-id <session_id> --window-id <window_id>
215+
216+ python -m simulateinput.cli.main locate uia --session-id <session_id> --name "Submit"
217+ python -m simulateinput.cli.main action click-uia --session-id <session_id> --name "Submit"
218+ python -m simulateinput.cli.main action screenshot --session-id <session_id> --output artifacts/shot.png
219+ ```
220+
221+ ## YAML 用例执行
222+
223+ ``` powershell
224+ $env:PYTHONPATH='src'
225+ python -m simulateinput.cli.main case run tests/e2e/cases/windows-smoke.yaml
226+ ```
227+
228+ 示例:
229+
230+ ``` yaml
231+ name : locator-smoke
232+ profile : lab_default
233+ steps :
234+ - action : attach_window
235+ title : Notepad
236+
237+ - action : locate_text
238+ text : File
239+
240+ - action : screenshot
241+ output : artifacts/locator-smoke.png
242+ ` ` `
243+
244+ ## MCP 接入
245+
246+ 启动本地 MCP 服务:
247+
248+ ` ` ` powershell
249+ $env:PYTHONPATH='src'
250+ python -m simulateinput.cli.main mcp serve
251+ ```
252+
253+ 当前 MCP 已支持:
254+
255+ - 会话管理
256+ - 窗口附着
257+ - 结构化定位
258+ - OCR / 图像定位
259+ - 点击与拖拽
260+ - 键盘动作
261+ - 截图
262+
263+ ## 安装说明
264+
265+ 详见 ` docs/cross-platform-installation.md ` ,其中包含:
266+
267+ - Python 依赖
268+ - Tesseract OCR 安装
269+ - macOS 权限配置
270+ - Linux helper 工具安装
271+ - 平台 smoke case 说明
272+
273+ ## 文档
274+
275+ - 架构设计:` docs/automation-platform-design.md `
276+ - 安装文档:` docs/cross-platform-installation.md `
277+ - Skill:` skills/simulateinput/SKILL.md `
278+ - CLI 参考:` skills/simulateinput/references/cli-usage.md `
279+ - MCP 参考:` skills/simulateinput/references/mcp-tools.md `
280+
281+ ## 安全边界
282+
283+ SimulateInput 只应用于:
284+
285+ - 你自己的软件
286+ - 测试环境
287+ - 经过明确授权的系统
288+
289+ 它不用于绕过第三方反自动化机制、验证码或无关安全控制。
0 commit comments