Skip to content

Tab management (switch_tab, close_tab, tab list in snapshot) #440

@lmorchard

Description

@lmorchard

Current state

Pilo's AriaBrowser (and its PlaywrightBrowser implementation) operates on a single page instance. When a click opens a new tab — common pattern with target="_blank" links, "View in new tab" actions, OAuth popup flows — Pilo's view of the world doesn't change. The original page is still this.page; the new tab is invisible.

Practical consequences:

  • The agent clicks a link, sees no change in the snapshot, and concludes "the click didn't work" — when in fact the result is on a different tab Pilo can't see.
  • OAuth flows that pop up a sign-in window can't be completed.
  • Tasks that involve cross-tab workflows (open product page in new tab, read details, come back, compare with another) aren't expressible.

The system prompt makes no mention of tabs, and the per-step snapshot doesn't surface tab info.

The gap

Tab management is a real capability gap that affects a non-trivial fraction of real-world web tasks:

  • E-commerce: "open these three product pages in tabs and compare prices" — single-tab Pilo can't do this without losing context every navigation.
  • OAuth / SSO: popup-based auth flows are common and currently fail.
  • Reference / research: "find this in source A while keeping source B open" is unrepresentable.

Even worse: when a click does open a new tab, Pilo silently doesn't notice. The agent reports completing the click; the user looking at the browser sees the result in a new tab; Pilo doesn't know it happened.

Proposed scope

A. Track multiple pages in PlaywrightBrowser

Currently PlaywrightBrowser has this.page: Page. Extend to:

private pages: Page[] = [];
private activePageIndex: number = 0;

private get page(): Page {
  return this.pages[this.activePageIndex];
}

Listen for context.on("page", newPage => ...) to detect new tabs/popups opened by the page. Push them onto pages. Give each tab a stable short ID (e.g., first 4 chars of a hash of page._guid or a sequential ID).

B. Add switch_tab and close_tab tools

switch_tab: tool({
  description:
    "Switch to a different open browser tab. The next snapshot will show the new tab. " +
    "Use list_tabs() to see available tabs (or check the available_tabs section " +
    "in your page snapshot).",
  inputSchema: z.object({
    tabId: z.string().describe("The tab ID from the available tabs list"),
  }),
  execute: async ({ tabId }) => {
    return performActionWithValidation(PageAction.SwitchTab, context, undefined, tabId);
  },
}),

close_tab: tool({
  description: "Close a browser tab. If closing the active tab, the previously-active " +
    "tab becomes active.",
  inputSchema: z.object({
    tabId: z.string(),
  }),
  execute: async ({ tabId }) => {
    return performActionWithValidation(PageAction.CloseTab, context, undefined, tabId);
  },
}),

C. Surface tabs in per-step snapshot prompt

Extend buildPageSnapshotPrompt (prompts.ts:475-500) to include a tabs section:

Available tabs:
  Tab A1B2 (current): example.com - "Home"
  Tab C3D4: support.example.com - "Help Center"

[rest of snapshot]

The getTreeWithRefs return shape (or a new sibling method) needs to expose the tab list. Add:

{
  activeTabId: string;
  tabs: Array<{ id: string; url: string; title: string; isActive: boolean }>;
}

D. Surface new-tab events

When a click opens a new tab, emit a BROWSER_TAB_OPENED event with the new tab's ID and URL. The next snapshot also reflects this so the model knows to switch.

Decision: on new tab open, do not auto-switch. The model decides whether to switch. This avoids surprises ("I clicked X and now I'm somewhere else").

E. System prompt update

Add a best-practices bullet:

- When you click a link that opens a new tab, the new tab will appear in the Available
  Tabs list in your next snapshot. Use switch_tab() to navigate to it. Close tabs you
  no longer need with close_tab() to keep the list manageable.

Implementation notes

  • Playwright's BrowserContext.on("page") event fires when a new page opens in the context. Use this to populate this.pages.
  • The active-page concept is brittle if Pilo's user (a parent app) drives the browser externally — e.g., the user manually opens a tab in the extension. Decision: enumerate pages on every snapshot via context.pages(), refresh the local list, preserve IDs for known pages, generate IDs for unknown ones. This handles externally-driven tab changes.
  • The ariaTree generation runs on this.page. Switching tabs means subsequent snapshots target the newly-active tab. That's the desired behavior.
  • Cross-tab refs are scoped to a single tab. [ref=E12] on tab A1B2 is unrelated to [ref=E12] on tab C3D4. The ariaTree counter resets per tab. The model must ensure refs match the current tab. The system prompt should make this explicit:
- Element refs are scoped to the current tab. After switch_tab(), use only refs from
  the new tab's snapshot.
  • Tab IDs should be short and human-readable (4 chars) so prompts don't bloat. Don't use Playwright's internal _guid.
  • Memory: every tab kept open consumes browser memory. Cap the open-tab count (e.g., 10) and reject new-tab opens beyond that, or auto-close the oldest.

Acceptance criteria

  • New tabs opened by click events are tracked and visible to the model in the next snapshot.
  • switch_tab and close_tab tools work and update the active page.
  • Per-step snapshot prompt surfaces the tabs list with IDs, URLs, titles, active marker.
  • BROWSER_TAB_OPENED event fires when a new tab is detected.
  • The system prompt informs the model about tab scoping of refs.
  • Tests cover: click opens a new tab, switch to new tab, snapshot reflects new tab's content, close active tab and re-promote, close non-active tab, ref scoping across tabs.
  • Manual smoke test: an OAuth-style popup flow completes.

Effort estimate

3-4 days. Tab tracking and switching are straightforward; the prompt integration and per-tab ref scoping require care. Cross-tab edge cases (close active vs. inactive, externally-driven changes) add testing complexity.

Related issues

Independent of the others. Pairs naturally with the modal-aware occlusion work (both expose more accurate page state) and with the action-vocab additions (tab management is a "missing capability" theme).

Files likely affected

  • packages/core/src/browser/playwrightBrowser.ts (multi-page tracking, tab handlers)
  • packages/core/src/browser/ariaBrowser.ts (PageAction enum, interface extensions)
  • packages/core/src/tools/webActionTools.ts (new tools)
  • packages/core/src/prompts.ts (snapshot template, tool examples, best practices)
  • packages/core/src/webAgent.ts (addPageSnapshot to include tabs)
  • packages/core/src/events.ts (new tab events)
  • packages/core/test/

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions