Browser automation tool (headless, persistent profiles, sidecar, opt-in)

## Summary
Give the agent the ability to read JS-heavy pages and perform actions on the user's behalf via a
headless browser, while keeping the core container lean and the capability quarantined.

## Approach
Follow the existing "CLI/tool handles protocol complexity" pattern:
- `tools/browser.py` wrapping headless Playwright with a small verb set: `goto`, `read`,
  `screenshot`, `click`, `fill`, `submit`.
- **Persistent authenticated profiles** (a `user-data-dir` per persona/site under `data/`): log in
  once, then reuse the real cookies/session. Highest-leverage reliability lever.
- Run Chromium in a **sidecar compose service**, not the main image, to keep the core small.
- Register in the optional-tools registry (`core/tools.py`), **disabled by default** (same shape as
  the `gh` integration); advertised to the model only when enabled.
- A `browser.md` skill documents the verbs and conventions.

## Distinctions worth encoding
- **Browser-as-renderer** (load + read/screenshot): low detection, broadly reliable.
- **Browser-as-actor** (log in + click + submit): higher detection, MFA, ToS exposure; gate behind
  the permission engine.
- Prefer an existing API/CLI over the browser whenever one exists; browser automation is a last
  resort.

## Known limitations (set expectations)
Sites behind major bot-management / anti-automation services, or interactive challenges, may block
headless automation. Persistent authenticated sessions mitigate the common cases but not the hardest
tier. Residential proxies / challenge-solvers are explicitly out of scope.

## UX & product
- **The login/auth flow is the key UX problem** — design it explicitly: (a) import an existing
  logged-in session, (b) a guided "log in on a trusted device, then we reuse the session" flow, or
  (c) a hosted interactive browser view in the admin UI for the one-time login. Capture the chosen
  flow as a sub-task; do not assume the naive case.
- **Admin UI:** enable/disable toggle (off by default), a per-domain permission-rule editor, saved
  profiles with auth status, and a "test" action — **responsive/touch-friendly** at phone width,
  reusing consistent toggle + list + approval components.
- **On the go (Telegram):** the agent **sends screenshots** so the user follows along on their
  phone; state-changing actions use the existing inline approve/deny flow with consistent button
  conventions.
- **Mobile-first:** watching and approving a browser action from Telegram (with a screenshot) is a
  first-class path; full logs live in the web UI.

## Setup & onboarding
- Disabled by default; surfaced as an **optional wizard step** that, when enabled, stands up the
  sidecar and prompts for the per-domain rules.
- A clear "what works / what may be blocked" note in the UI sets expectations up front.

## Acceptance criteria
- A page can be loaded and read/screenshotted headlessly.
- A simple authenticated action works against a site using a persisted profile.
- A first-time site login can be completed through a documented, **mobile-followable** flow; the user
  can watch and approve browser actions from Telegram via screenshots + buttons.
- The admin browser settings are usable at phone width.
- The capability is invisible when disabled; writes require approval when enabled.

## Related
- Shares the sandbox sidecar with: pi.dev coding harness.
- Screenshot reading on non-vision models depends on: vision fallback.
- Profiles/sessions are a natural fit for: secrets vault.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Browser automation tool (headless, persistent profiles, sidecar, opt-in) #16

Summary

Approach

Distinctions worth encoding

Known limitations (set expectations)

UX & product

Setup & onboarding

Acceptance criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Browser automation tool (headless, persistent profiles, sidecar, opt-in) #16

Description

Summary

Approach

Distinctions worth encoding

Known limitations (set expectations)

UX & product

Setup & onboarding

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions