Current state
Pilo's AriaBrowser (and its PlaywrightBrowser implementation) operates on a single page instance. When a click opens a new tab — common pattern with target="_blank" links, "View in new tab" actions, OAuth popup flows — Pilo's view of the world doesn't change. The original page is still this.page; the new tab is invisible.
Practical consequences:
- The agent clicks a link, sees no change in the snapshot, and concludes "the click didn't work" — when in fact the result is on a different tab Pilo can't see.
- OAuth flows that pop up a sign-in window can't be completed.
- Tasks that involve cross-tab workflows (open product page in new tab, read details, come back, compare with another) aren't expressible.
The system prompt makes no mention of tabs, and the per-step snapshot doesn't surface tab info.
The gap
Tab management is a real capability gap that affects a non-trivial fraction of real-world web tasks:
- E-commerce: "open these three product pages in tabs and compare prices" — single-tab Pilo can't do this without losing context every navigation.
- OAuth / SSO: popup-based auth flows are common and currently fail.
- Reference / research: "find this in source A while keeping source B open" is unrepresentable.
Even worse: when a click does open a new tab, Pilo silently doesn't notice. The agent reports completing the click; the user looking at the browser sees the result in a new tab; Pilo doesn't know it happened.
Proposed scope
A. Track multiple pages in PlaywrightBrowser
Currently PlaywrightBrowser has this.page: Page. Extend to:
private pages: Page[] = [];
private activePageIndex: number = 0;
private get page(): Page {
return this.pages[this.activePageIndex];
}
Listen for context.on("page", newPage => ...) to detect new tabs/popups opened by the page. Push them onto pages. Give each tab a stable short ID (e.g., first 4 chars of a hash of page._guid or a sequential ID).
B. Add switch_tab and close_tab tools
switch_tab: tool({
description:
"Switch to a different open browser tab. The next snapshot will show the new tab. " +
"Use list_tabs() to see available tabs (or check the available_tabs section " +
"in your page snapshot).",
inputSchema: z.object({
tabId: z.string().describe("The tab ID from the available tabs list"),
}),
execute: async ({ tabId }) => {
return performActionWithValidation(PageAction.SwitchTab, context, undefined, tabId);
},
}),
close_tab: tool({
description: "Close a browser tab. If closing the active tab, the previously-active " +
"tab becomes active.",
inputSchema: z.object({
tabId: z.string(),
}),
execute: async ({ tabId }) => {
return performActionWithValidation(PageAction.CloseTab, context, undefined, tabId);
},
}),
C. Surface tabs in per-step snapshot prompt
Extend buildPageSnapshotPrompt (prompts.ts:475-500) to include a tabs section:
Available tabs:
Tab A1B2 (current): example.com - "Home"
Tab C3D4: support.example.com - "Help Center"
[rest of snapshot]
The getTreeWithRefs return shape (or a new sibling method) needs to expose the tab list. Add:
{
activeTabId: string;
tabs: Array<{ id: string; url: string; title: string; isActive: boolean }>;
}
D. Surface new-tab events
When a click opens a new tab, emit a BROWSER_TAB_OPENED event with the new tab's ID and URL. The next snapshot also reflects this so the model knows to switch.
Decision: on new tab open, do not auto-switch. The model decides whether to switch. This avoids surprises ("I clicked X and now I'm somewhere else").
E. System prompt update
Add a best-practices bullet:
- When you click a link that opens a new tab, the new tab will appear in the Available
Tabs list in your next snapshot. Use switch_tab() to navigate to it. Close tabs you
no longer need with close_tab() to keep the list manageable.
Implementation notes
- Playwright's
BrowserContext.on("page") event fires when a new page opens in the context. Use this to populate this.pages.
- The active-page concept is brittle if Pilo's user (a parent app) drives the browser externally — e.g., the user manually opens a tab in the extension. Decision: enumerate pages on every snapshot via
context.pages(), refresh the local list, preserve IDs for known pages, generate IDs for unknown ones. This handles externally-driven tab changes.
- The ariaTree generation runs on
this.page. Switching tabs means subsequent snapshots target the newly-active tab. That's the desired behavior.
- Cross-tab refs are scoped to a single tab.
[ref=E12] on tab A1B2 is unrelated to [ref=E12] on tab C3D4. The ariaTree counter resets per tab. The model must ensure refs match the current tab. The system prompt should make this explicit:
- Element refs are scoped to the current tab. After switch_tab(), use only refs from
the new tab's snapshot.
- Tab IDs should be short and human-readable (4 chars) so prompts don't bloat. Don't use Playwright's internal
_guid.
- Memory: every tab kept open consumes browser memory. Cap the open-tab count (e.g., 10) and reject new-tab opens beyond that, or auto-close the oldest.
Acceptance criteria
- New tabs opened by click events are tracked and visible to the model in the next snapshot.
switch_tab and close_tab tools work and update the active page.
- Per-step snapshot prompt surfaces the tabs list with IDs, URLs, titles, active marker.
BROWSER_TAB_OPENED event fires when a new tab is detected.
- The system prompt informs the model about tab scoping of refs.
- Tests cover: click opens a new tab, switch to new tab, snapshot reflects new tab's content, close active tab and re-promote, close non-active tab, ref scoping across tabs.
- Manual smoke test: an OAuth-style popup flow completes.
Effort estimate
3-4 days. Tab tracking and switching are straightforward; the prompt integration and per-tab ref scoping require care. Cross-tab edge cases (close active vs. inactive, externally-driven changes) add testing complexity.
Related issues
Independent of the others. Pairs naturally with the modal-aware occlusion work (both expose more accurate page state) and with the action-vocab additions (tab management is a "missing capability" theme).
Files likely affected
packages/core/src/browser/playwrightBrowser.ts (multi-page tracking, tab handlers)
packages/core/src/browser/ariaBrowser.ts (PageAction enum, interface extensions)
packages/core/src/tools/webActionTools.ts (new tools)
packages/core/src/prompts.ts (snapshot template, tool examples, best practices)
packages/core/src/webAgent.ts (addPageSnapshot to include tabs)
packages/core/src/events.ts (new tab events)
packages/core/test/
Current state
Pilo's
AriaBrowser(and itsPlaywrightBrowserimplementation) operates on a singlepageinstance. When a click opens a new tab — common pattern withtarget="_blank"links, "View in new tab" actions, OAuth popup flows — Pilo's view of the world doesn't change. The original page is stillthis.page; the new tab is invisible.Practical consequences:
The system prompt makes no mention of tabs, and the per-step snapshot doesn't surface tab info.
The gap
Tab management is a real capability gap that affects a non-trivial fraction of real-world web tasks:
Even worse: when a click does open a new tab, Pilo silently doesn't notice. The agent reports completing the click; the user looking at the browser sees the result in a new tab; Pilo doesn't know it happened.
Proposed scope
A. Track multiple pages in
PlaywrightBrowserCurrently
PlaywrightBrowserhasthis.page: Page. Extend to:Listen for
context.on("page", newPage => ...)to detect new tabs/popups opened by the page. Push them ontopages. Give each tab a stable short ID (e.g., first 4 chars of a hash ofpage._guidor a sequential ID).B. Add
switch_tabandclose_tabtoolsC. Surface tabs in per-step snapshot prompt
Extend
buildPageSnapshotPrompt(prompts.ts:475-500) to include a tabs section:The
getTreeWithRefsreturn shape (or a new sibling method) needs to expose the tab list. Add:D. Surface new-tab events
When a click opens a new tab, emit a
BROWSER_TAB_OPENEDevent with the new tab's ID and URL. The next snapshot also reflects this so the model knows to switch.Decision: on new tab open, do not auto-switch. The model decides whether to switch. This avoids surprises ("I clicked X and now I'm somewhere else").
E. System prompt update
Add a best-practices bullet:
Implementation notes
BrowserContext.on("page")event fires when a new page opens in the context. Use this to populatethis.pages.context.pages(), refresh the local list, preserve IDs for known pages, generate IDs for unknown ones. This handles externally-driven tab changes.this.page. Switching tabs means subsequent snapshots target the newly-active tab. That's the desired behavior.[ref=E12]on tab A1B2 is unrelated to[ref=E12]on tab C3D4. The ariaTree counter resets per tab. The model must ensure refs match the current tab. The system prompt should make this explicit:_guid.Acceptance criteria
switch_tabandclose_tabtools work and update the active page.BROWSER_TAB_OPENEDevent fires when a new tab is detected.Effort estimate
3-4 days. Tab tracking and switching are straightforward; the prompt integration and per-tab ref scoping require care. Cross-tab edge cases (close active vs. inactive, externally-driven changes) add testing complexity.
Related issues
Independent of the others. Pairs naturally with the modal-aware occlusion work (both expose more accurate page state) and with the action-vocab additions (tab management is a "missing capability" theme).
Files likely affected
packages/core/src/browser/playwrightBrowser.ts(multi-page tracking, tab handlers)packages/core/src/browser/ariaBrowser.ts(PageAction enum, interface extensions)packages/core/src/tools/webActionTools.ts(new tools)packages/core/src/prompts.ts(snapshot template, tool examples, best practices)packages/core/src/webAgent.ts(addPageSnapshotto include tabs)packages/core/src/events.ts(new tab events)packages/core/test/