|
1 | 1 | # openadapt-types |
2 | | -Canonical Pydantic schemas for computer-use agents: ComputerState, Action, ActionResult, UINode |
| 2 | + |
| 3 | +Canonical Pydantic schemas for computer-use agents. |
| 4 | + |
| 5 | +``` |
| 6 | +pip install openadapt-types |
| 7 | +``` |
| 8 | + |
| 9 | +## What's in the box |
| 10 | + |
| 11 | +| Schema | Purpose | |
| 12 | +|--------|---------| |
| 13 | +| `ComputerState` | Screen state: screenshot + UI element graph + window context | |
| 14 | +| `UINode` | Single UI element with role, bbox, hierarchy, platform anchors | |
| 15 | +| `Action` | Agent action with typed action space + flexible targeting | |
| 16 | +| `ActionTarget` | Where to act: `node_id` > `description` > `(x, y)` coordinates | |
| 17 | +| `ActionResult` | Execution outcome with error taxonomy + state delta | |
| 18 | +| `Episode` / `Step` | Complete task trajectory (observation → action → result) | |
| 19 | +| `FailureRecord` | Classified failure for dataset pipelines | |
| 20 | + |
| 21 | +## Quick start |
| 22 | + |
| 23 | +```python |
| 24 | +from openadapt_types import ( |
| 25 | + Action, ActionTarget, ActionType, |
| 26 | + ComputerState, UINode, BoundingBox, |
| 27 | +) |
| 28 | + |
| 29 | +# Describe what's on screen |
| 30 | +state = ComputerState( |
| 31 | + viewport=(1920, 1080), |
| 32 | + nodes=[ |
| 33 | + UINode(node_id="n0", role="window", name="My App", children_ids=["n1"]), |
| 34 | + UINode(node_id="n1", role="button", name="Submit", parent_id="n0", |
| 35 | + bbox=BoundingBox(x=500, y=400, width=100, height=40)), |
| 36 | + ], |
| 37 | +) |
| 38 | + |
| 39 | +# Agent decides what to do |
| 40 | +action = Action( |
| 41 | + type=ActionType.CLICK, |
| 42 | + target=ActionTarget(node_id="n1"), |
| 43 | + reasoning="Click Submit to proceed", |
| 44 | +) |
| 45 | + |
| 46 | +# Render element tree for LLM prompts |
| 47 | +print(state.to_text_tree()) |
| 48 | +# [n0] window: My App |
| 49 | +# [n1] button: Submit |
| 50 | +``` |
| 51 | + |
| 52 | +## Action targeting |
| 53 | + |
| 54 | +`ActionTarget` supports three grounding strategies (in priority order): |
| 55 | + |
| 56 | +```python |
| 57 | +# 1. Element-based (preferred — most robust) |
| 58 | +ActionTarget(node_id="n1") |
| 59 | + |
| 60 | +# 2. Description-based (resolved by grounding module) |
| 61 | +ActionTarget(description="the blue submit button") |
| 62 | + |
| 63 | +# 3. Coordinate-based (fallback) |
| 64 | +ActionTarget(x=550, y=420) |
| 65 | +ActionTarget(x=0.29, y=0.39, is_normalized=True) |
| 66 | +``` |
| 67 | + |
| 68 | +Agents SHOULD produce `node_id` or `description`. The runtime resolves to coordinates. |
| 69 | + |
| 70 | +## Compatibility with existing schemas |
| 71 | + |
| 72 | +Converters for three existing OpenAdapt schema formats: |
| 73 | + |
| 74 | +```python |
| 75 | +from openadapt_types._compat import ( |
| 76 | + from_benchmark_observation, # openadapt-evals BenchmarkObservation |
| 77 | + from_benchmark_action, # openadapt-evals BenchmarkAction |
| 78 | + from_ml_observation, # openadapt-ml Observation |
| 79 | + from_ml_action, # openadapt-ml Action |
| 80 | + from_omnimcp_screen_state, # omnimcp ScreenState |
| 81 | + from_omnimcp_action_decision, # omnimcp ActionDecision |
| 82 | +) |
| 83 | + |
| 84 | +# Convert existing data |
| 85 | +state = from_benchmark_observation(obs.__dict__) |
| 86 | +action = from_benchmark_action(act.__dict__) |
| 87 | +``` |
| 88 | + |
| 89 | +## JSON Schema |
| 90 | + |
| 91 | +Export for language-agnostic tooling: |
| 92 | + |
| 93 | +```python |
| 94 | +import json |
| 95 | +from openadapt_types import ComputerState, Action, Episode |
| 96 | + |
| 97 | +# Get JSON Schema |
| 98 | +schema = ComputerState.model_json_schema() |
| 99 | +print(json.dumps(schema, indent=2)) |
| 100 | +``` |
| 101 | + |
| 102 | +## Design principles |
| 103 | + |
| 104 | +- **Pydantic v2** — runtime validation, JSON Schema export, fast serialization |
| 105 | +- **Pixels + structure** — always capture both visual and semantic UI state |
| 106 | +- **Node graph** — full element tree, not just focused element |
| 107 | +- **Platform-agnostic** — same schema for Windows, macOS, Linux, web |
| 108 | +- **Extension-friendly** — `raw`, `attributes`, `metadata` fields everywhere |
| 109 | +- **Backward compatible** — `_compat` converters for gradual migration |
| 110 | + |
| 111 | +## Dependencies |
| 112 | + |
| 113 | +Just `pydantic>=2.0`. No ML libraries, no heavy deps. |
| 114 | + |
| 115 | +## License |
| 116 | + |
| 117 | +MIT |
0 commit comments