Modal-aware occlusion and viewport context in ariaTree

## Current state

Pilo's ariaTree (`packages/core/src/browser/ariaTree/ariaSnapshot.ts`) generates a tree of all CSS-visible, accessible elements with `getBoundingClientRect()` width and height > 0. Two consequences:

### 1. No modal-aware filtering

When a modal/dialog is open on top of the page, the underlying page's elements **also** get refs in the snapshot. Example: a "Sign in" modal covers a checkout page. The snapshot contains both the modal's `<input>`/buttons AND the underlying page's "Buy now" / "Continue" / etc. buttons. The LLM has to guess which layer to interact with.

The system prompt papers over this with guidance (`prompts.ts:339`):

> Clear obstructing modals/popups first

But the model often clicks the *wrong* layer first because the underlying elements look like valid targets in the tree.

### 2. No viewport / scroll-position context

The snapshot includes elements far below the fold — anything with non-zero size is in the tree, regardless of whether the user can see it. The model has no signal for:

- "Is element E47 currently visible, or 3000px down?"
- "Has the page been scrolled? How much room is below?"
- "Is there content above I might need to scroll up to?"

The LLM uses cues like "this is at the top of the tree, so probably near the top of the page" — which is mostly true but breaks for `position: fixed` headers, sticky sidebars, modal portals that render at the end of `<body>` etc.

## The gap

**Modal blindness** is a real source of agent failure: the model fills the wrong form, clicks the wrong button, sees errors about elements being obscured. Pilo's `BROWSER_ACTION_COMPLETED` error on `Element is not visible` or `obstructed by another element` traces back to this.

**Lack of scroll context** makes scroll decisions blind: the model doesn't know when to scroll or how far. If a scroll tool is added (see separate issue), it lands without supporting context.

## Proposed scope

### A. Modal-aware occlusion in ariaTree

During tree generation in `ariaSnapshot.ts`, detect "modal-like" elements:

```ts
function isModalLike(element: Element, role: string, props: Record<string, unknown>): boolean {
  // ARIA: explicit dialog/alertdialog with aria-modal=true
  if ((role === "dialog" || role === "alertdialog") && element.getAttribute("aria-modal") === "true") {
    return true;
  }
  // HTML <dialog open> with showModal()
  if (element.tagName === "DIALOG" && (element as HTMLDialogElement).open && (element as HTMLDialogElement).matches(":modal")) {
    return true;
  }
  // Heuristic: very large fixed-position overlay covering >70% of viewport
  const style = window.getComputedStyle(element);
  if (style.position === "fixed" || style.position === "absolute") {
    const rect = element.getBoundingClientRect();
    if (rect.width > window.innerWidth * 0.7 && rect.height > window.innerHeight * 0.7) {
      // Plus has a high z-index OR is the last child of body (portal heuristic)
      const zIndex = parseInt(style.zIndex || "0", 10);
      if (zIndex > 1000) return true;
    }
  }
  return false;
}
```

When a modal-like element is detected during tree generation:

- Mark the modal's subtree normally (refs assigned).
- Mark non-modal subtrees with a `[obscured]` property OR drop their refs entirely (so the LLM can't target them).
- Add a note at the top of the YAML output: `# A modal is open. Only modal elements have refs.`

The `[obscured]` approach is preferred because it preserves the structural context — the model sees that other elements exist but understands they're not interactable right now.

### B. Viewport / scroll context in the per-step user message

Extend `getTreeWithRefs` (or the snapshot wrapper in `webAgent.ts:779-868`) to also return:

```ts
{
  yaml: string;
  viewport: {
    scrollY: number;
    docHeight: number;
    viewportHeight: number;
    pagesAbove: number;  // floor(scrollY / viewportHeight)
    pagesBelow: number;  // floor((docHeight - scrollY - viewportHeight) / viewportHeight)
    atTop: boolean;
    atBottom: boolean;
  };
}
```

In the page-snapshot prompt template (`prompts.ts:475-500`), insert before the tree:

```
Page position: {pagesAbove} viewport(s) above, {pagesBelow} viewport(s) below.
{% if atTop %}You are at the top of the page.{% endif %}
{% if atBottom %}You are at the bottom of the page.{% endif %}
```

### C. Optional: surface offscreen interactive elements as hints

When some interactive elements (buttons, inputs, links) are below the viewport, surface a short hint:

```
Page position: 0 viewports above, 5 viewports below.
There are interactive elements below the viewport not shown in detail. Scroll down to see them.
```

Initially do not list specifics — just signal their presence. A future refinement could surface accessible names of below-fold interactive elements with `[offscreen]` markers, but that grows the prompt and is not always useful.

## Implementation notes

- The modal detection runs in-page (same context as ariaTree). The bundle in `bundle.ts` needs to be regenerated after adding `isModalLike` to `ariaSnapshot.ts` — that happens via `scripts/bundle-aria-tree.ts` at build time.
- The heuristic-based modal detection (fixed/absolute + large + high z-index) is fragile. ARIA-based detection (`role="dialog"` + `aria-modal=true`) is reliable when present. Most modern UI libraries set these correctly; some legacy/custom-built modals don't. Combine both.
- Some pages have "soft modals" (a panel that takes focus but doesn't strictly block the rest of the page). The agent should still be able to interact with the rest. Conservative rule: only trigger occlusion behavior on `aria-modal=true` or `<dialog open>:modal()`. Heuristic-based detection emits a comment but does NOT mark non-modal elements obscured (just a warning the model can use).
- Viewport metrics computation is cheap. The data flows through `getTreeWithRefs` → `addPageSnapshot` → `buildPageSnapshotPrompt`.
- Testing modal occlusion: pick 2-3 real sites with modals (a cookie banner, a sign-in modal, a confirmation dialog). Verify the snapshot reflects the modal-only state when one is open.

## Acceptance criteria

- ariaTree detects ARIA modals (`role="dialog"` + `aria-modal="true"` and `<dialog>:modal()`); their non-modal siblings carry `[obscured]` markers OR are dropped (decide based on benchmark).
- Heuristic-detected modals emit a comment at the top of the snapshot (non-modal elements stay refs but the model has the signal).
- Per-step snapshot prompt includes scroll position context (`pagesAbove`, `pagesBelow`, `atTop`, `atBottom`).
- Tests in `packages/core/test/` cover: ARIA modal occlusion, heuristic modal warning, viewport metrics on scrolled and unscrolled pages.
- Manual smoke test: agent on a page with an open modal correctly interacts only with the modal.

## Effort estimate

2-3 days. Modal detection is the larger piece; viewport context is a few hours.

## Related issues

Pairs with the scroll-action issue (scroll context becomes actionable when the agent has a scroll tool). Independent of the others.

## Files likely affected

- `packages/core/src/browser/ariaTree/ariaSnapshot.ts` (modal detection during tree walk)
- `packages/core/src/browser/ariaTree/types.ts` (extended AriaNode props)
- `packages/core/src/browser/ariaBrowser.ts` (return type for getTreeWithRefs)
- `packages/core/src/browser/playwrightBrowser.ts` (viewport metrics return)
- `packages/core/src/webAgent.ts` (`addPageSnapshot`)
- `packages/core/src/prompts.ts` (page-snapshot template)
- `packages/core/test/`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modal-aware occlusion and viewport context in ariaTree #435

Current state

1. No modal-aware filtering

2. No viewport / scroll-position context

The gap

Proposed scope

A. Modal-aware occlusion in ariaTree

B. Viewport / scroll context in the per-step user message

C. Optional: surface offscreen interactive elements as hints

Implementation notes

Acceptance criteria

Effort estimate

Related issues

Files likely affected

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Modal-aware occlusion and viewport context in ariaTree #435

Description

Current state

1. No modal-aware filtering

2. No viewport / scroll-position context

The gap

Proposed scope

A. Modal-aware occlusion in ariaTree

B. Viewport / scroll context in the per-step user message

C. Optional: surface offscreen interactive elements as hints

Implementation notes

Acceptance criteria

Effort estimate

Related issues

Files likely affected

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions