Skip to content

samarthm04/agent-interface-layer

Repository files navigation

Agent Interaction Layer (AIL)

A Python MCP server that gives Claude (or any MCP client) the ability to interact with any website without requiring a dedicated connector, native API, or pre-built integration.

Claude passes a URL and a goal. AIL handles all perception, decision-making, execution, safety, and memory internally. Claude never touches a browser. Claude never sees DOM. It receives only the final result.


Architecture

┌────────────────────────────────────────────────────────────────────┐
│  MCP Client (Claude Desktop / claude CLI)                          │
│                                                                    │
│    browse(url, goal, conversation_id?)  ──────────────────────┐   │
└───────────────────────────────────────────────────────────────┼───┘
                                                                │
┌───────────────────────────────────────────────────────────────▼───┐
│  server.py  (FastMCP)                                              │
│  • One tool: browse()                                              │
│  • session_id = SessionManager.make_session_id(...)               │
│  • Delegates to AgentController.run()                              │
│  • Never raises — returns errors as strings                        │
└───────────────────────────────────────────────────────────────┬───┘
                                                                │
┌───────────────────────────────────────────────────────────────▼───┐
│  controller.py                                                     │
│                                                                    │
│  SessionManager                                                    │
│  ├─ make_session_id(url, goal, conversation_id)                    │
│  ├─ get_or_create(session_id)  →  {context, page, timestamps}     │
│  └─ cleanup_expired()  (TTL = 30 min)                              │
│                                                                    │
│  AgentController                                                   │
│  └─ run(url, goal, session_id)                                     │
│     ┌── while step_count < 20 ──────────────────────────────────┐ │
│     │  1. perceive(page)           → elements[]                  │ │
│     │  2. decide(elements, goal)   → action{}                    │ │
│     │  3. safety.validate(action)  → valid? destructive?         │ │
│     │  4. executor.execute(action) → result{}                    │ │
│     │  5. check_completion()       → bool                        │ │
│     └─────────────────────────────────────────────────────────-─┘ │
└───────────────┬──────────────────┬────────────┬────────────────────┘
                │                  │            │
    ┌───────────▼──┐  ┌────────────▼──┐  ┌─────▼──────────┐
    │ perception.py │  │  executor.py  │  │   safety.py    │
    │               │  │               │  │                │
    │  Layer 1:     │  │  click()      │  │  type check    │
    │  a11y tree    │  │  type()       │  │  id exists     │
    │     ↓ <0.6    │  │  select()     │  │  enabled+vis   │
    │  Layer 2:     │  │  wait()       │  │  destructive?  │
    │  DOM query    │  │  scroll()     │  │  step limit    │
    │     ↓ <5 els  │  │               │  │  timeout       │
    │  Layer 3:     │  │  Fingerprint  │  └────────────────┘
    │  Vision/Haiku │  │  element_map  │
    └───────────────┘  └───────────────┘

    ┌───────────────┐  ┌───────────────┐
    │  memory.py    │  │  logger.py    │
    │               │  │               │
    │  ~/.ail/      │  │  ail.log      │
    │  memory/      │  │  JSON lines   │
    │  {domain}.json│  │  AIL_DEBUG=1  │
    └───────────────┘  └───────────────┘

Install

pip install anthropic playwright fastmcp
playwright install chromium

Environment Variables

Variable Required Default Description
ANTHROPIC_API_KEY Yes Anthropic API key
AIL_DEBUG No 0 Set to 1 for DEBUG-level JSON logs

Run as MCP Server

export ANTHROPIC_API_KEY=your_key
python server.py

Claude Desktop Configuration

Add this block to your claude_desktop_config.json:

{
  "mcpServers": {
    "agent-interaction-layer": {
      "command": "python",
      "args": ["/absolute/path/to/agent_interface_layer/server.py"],
      "env": {
        "ANTHROPIC_API_KEY": "your_key_here"
      }
    }
  }
}

Config file locations:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Example Usage (direct Python)

export ANTHROPIC_API_KEY=your_key
python example_usage.py

Runs three scenarios:

  1. Fresh session — new conversation_id, new browser context
  2. Session reuse — same conversation_id, different goal, context reused
  3. Memory hit — new conversation, same goal, SiteMemory hint injected

Module Summary

File Role
server.py FastMCP entry point, browse() tool
perception.py 3-layer element extraction (a11y → DOM → vision)
executor.py Only module that touches Playwright
safety.py Action validation, destructive pattern guard
memory.py Per-domain JSON pattern storage
controller.py Agent loop, session management, LLM calls
logger.py Structured JSON logging to ail.log
example_usage.py Three end-to-end demo scenarios

Key Design Decisions

  • Reactive loop, no upfront planning — the agent adapts to what's actually on the page each step. Planning before page load is brittle on arbitrary UIs.
  • Layers run exclusively — accessibility tree preferred; fall through only on quality failure. No wholesale merging.
  • Fingerprint-based IDsrole + name + description + parent_role + index_within_parent. Never incrementing integers. Never bounding-box coordinates.
  • LLM controls nothing directly — all browser interactions go through ActionExecutor only.
  • Session isolation by conversationsession_id = md5(conversation_id + url + goal). Never share sessions across callers.
  • Replan context decays in exactly 3 steps — not one-shot, not permanent.
  • Destructive actions always require confirmation — never auto-executed.

Logs

ail.log in the working directory. Each line is a JSON object:

{"level": "INFO", "logger": "ail.perception", "message": "perception_layer", "layer_used": "accessibility", "element_count": 42, "quality_score": 0.81}
{"level": "INFO", "logger": "ail.controller", "message": "llm_call", "model": "claude-haiku-4-5-20250307", "prompt_tokens_estimate": 310}
{"level": "INFO", "logger": "ail.executor", "message": "action_executed", "action": "click", "label": "Search Flights", "success": true, "retries": 0}

agent-interface-layer

About

A perception-driven, safety-constrained agent loop that enables LLMs to interact with arbitrary web interfaces using accessibility, DOM, and vision layers—fully abstracted behind MCP.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages