A Python MCP server that gives Claude (or any MCP client) the ability to interact with any website without requiring a dedicated connector, native API, or pre-built integration.
Claude passes a URL and a goal. AIL handles all perception, decision-making, execution, safety, and memory internally. Claude never touches a browser. Claude never sees DOM. It receives only the final result.
┌────────────────────────────────────────────────────────────────────┐
│ MCP Client (Claude Desktop / claude CLI) │
│ │
│ browse(url, goal, conversation_id?) ──────────────────────┐ │
└───────────────────────────────────────────────────────────────┼───┘
│
┌───────────────────────────────────────────────────────────────▼───┐
│ server.py (FastMCP) │
│ • One tool: browse() │
│ • session_id = SessionManager.make_session_id(...) │
│ • Delegates to AgentController.run() │
│ • Never raises — returns errors as strings │
└───────────────────────────────────────────────────────────────┬───┘
│
┌───────────────────────────────────────────────────────────────▼───┐
│ controller.py │
│ │
│ SessionManager │
│ ├─ make_session_id(url, goal, conversation_id) │
│ ├─ get_or_create(session_id) → {context, page, timestamps} │
│ └─ cleanup_expired() (TTL = 30 min) │
│ │
│ AgentController │
│ └─ run(url, goal, session_id) │
│ ┌── while step_count < 20 ──────────────────────────────────┐ │
│ │ 1. perceive(page) → elements[] │ │
│ │ 2. decide(elements, goal) → action{} │ │
│ │ 3. safety.validate(action) → valid? destructive? │ │
│ │ 4. executor.execute(action) → result{} │ │
│ │ 5. check_completion() → bool │ │
│ └─────────────────────────────────────────────────────────-─┘ │
└───────────────┬──────────────────┬────────────┬────────────────────┘
│ │ │
┌───────────▼──┐ ┌────────────▼──┐ ┌─────▼──────────┐
│ perception.py │ │ executor.py │ │ safety.py │
│ │ │ │ │ │
│ Layer 1: │ │ click() │ │ type check │
│ a11y tree │ │ type() │ │ id exists │
│ ↓ <0.6 │ │ select() │ │ enabled+vis │
│ Layer 2: │ │ wait() │ │ destructive? │
│ DOM query │ │ scroll() │ │ step limit │
│ ↓ <5 els │ │ │ │ timeout │
│ Layer 3: │ │ Fingerprint │ └────────────────┘
│ Vision/Haiku │ │ element_map │
└───────────────┘ └───────────────┘
┌───────────────┐ ┌───────────────┐
│ memory.py │ │ logger.py │
│ │ │ │
│ ~/.ail/ │ │ ail.log │
│ memory/ │ │ JSON lines │
│ {domain}.json│ │ AIL_DEBUG=1 │
└───────────────┘ └───────────────┘
pip install anthropic playwright fastmcp
playwright install chromium| Variable | Required | Default | Description |
|---|---|---|---|
ANTHROPIC_API_KEY |
Yes | — | Anthropic API key |
AIL_DEBUG |
No | 0 |
Set to 1 for DEBUG-level JSON logs |
export ANTHROPIC_API_KEY=your_key
python server.pyAdd this block to your claude_desktop_config.json:
{
"mcpServers": {
"agent-interaction-layer": {
"command": "python",
"args": ["/absolute/path/to/agent_interface_layer/server.py"],
"env": {
"ANTHROPIC_API_KEY": "your_key_here"
}
}
}
}Config file locations:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
export ANTHROPIC_API_KEY=your_key
python example_usage.pyRuns three scenarios:
- Fresh session — new conversation_id, new browser context
- Session reuse — same conversation_id, different goal, context reused
- Memory hit — new conversation, same goal, SiteMemory hint injected
| File | Role |
|---|---|
server.py |
FastMCP entry point, browse() tool |
perception.py |
3-layer element extraction (a11y → DOM → vision) |
executor.py |
Only module that touches Playwright |
safety.py |
Action validation, destructive pattern guard |
memory.py |
Per-domain JSON pattern storage |
controller.py |
Agent loop, session management, LLM calls |
logger.py |
Structured JSON logging to ail.log |
example_usage.py |
Three end-to-end demo scenarios |
- Reactive loop, no upfront planning — the agent adapts to what's actually on the page each step. Planning before page load is brittle on arbitrary UIs.
- Layers run exclusively — accessibility tree preferred; fall through only on quality failure. No wholesale merging.
- Fingerprint-based IDs —
role + name + description + parent_role + index_within_parent. Never incrementing integers. Never bounding-box coordinates. - LLM controls nothing directly — all browser interactions go through
ActionExecutoronly. - Session isolation by conversation —
session_id = md5(conversation_id + url + goal). Never share sessions across callers. - Replan context decays in exactly 3 steps — not one-shot, not permanent.
- Destructive actions always require confirmation — never auto-executed.
ail.log in the working directory. Each line is a JSON object:
{"level": "INFO", "logger": "ail.perception", "message": "perception_layer", "layer_used": "accessibility", "element_count": 42, "quality_score": 0.81}
{"level": "INFO", "logger": "ail.controller", "message": "llm_call", "model": "claude-haiku-4-5-20250307", "prompt_tokens_estimate": 310}
{"level": "INFO", "logger": "ail.executor", "message": "action_executed", "action": "click", "label": "Search Flights", "success": true, "retries": 0}