Skip to content

feat(core): add scroll tool#445

Open
lmorchard wants to merge 2 commits into
mainfrom
worktree-add-scroll-tool
Open

feat(core): add scroll tool#445
lmorchard wants to merge 2 commits into
mainfrom
worktree-add-scroll-tool

Conversation

@lmorchard
Copy link
Copy Markdown
Collaborator

Summary

Add a scroll tool so the agent has a real way to reveal content on dynamic / infinite-scroll pages, instead of being stuck when the accessibility tree appears incomplete.

  • New PageAction.Scroll in ariaBrowser.ts, with a ScrollDirection type ("up" | "down" | "top" | "bottom") and a SCROLL_DIRECTIONS constant the Zod schema reuses.
  • PlaywrightBrowser dispatches scroll via page.evaluate(direction => window.scrollBy / window.scrollTo) so it works in remote-CDP setups.
  • ExtensionBrowser uses browser.scripting.executeScript for the same effect — parity across runtimes.
  • scroll tool: scroll({direction}) with four values:
    • down / up — scroll one viewport in that direction
    • top / bottom — jump to the start / end of the document
  • Persona prompt's "if expected data isn't visible" guidance now points at the new scroll() action instead of describing a missing capability.

Split out of #444 (item F of issue #431). The issue said either land scroll first and keep the prompt guidance, or strip the guidance until scroll lands. This PR takes the "land scroll" path so item F's prompt-text-still-misleading footprint in #444 disappears once both merge.

Test plan

  • pnpm run check passes (typecheck + format + 1249 tests across core/cli/server/extension)
  • New tests assert:
    • scroll tool dispatches PageAction.Scroll with each of the four directions
    • Zod schema rejects unknown directions ("left", "", {})
    • PlaywrightBrowser calls page.evaluate with the direction
    • PlaywrightBrowser throws BrowserActionException when direction is missing
    • PageAction enum + non-element categorization lists include Scroll
  • gitleaks protect --staged clean

🤖 Generated with Claude Code

lmorchard and others added 2 commits May 13, 2026 14:46
Add `scroll({direction})` as a first-class browser action so the agent has
a way to reveal content on dynamic / infinite-scroll pages instead of
being stuck when the accessibility tree appears incomplete.

- `PageAction.Scroll` in `ariaBrowser.ts`, with `ScrollDirection` type and
  a `SCROLL_DIRECTIONS` constant the Zod schema reuses.
- PlaywrightBrowser: scroll case using `page.evaluate` →
  `window.scrollBy` / `window.scrollTo` so it works in remote-CDP setups.
- ExtensionBrowser: matching scroll case via
  `browser.scripting.executeScript` for parity with the extension runtime.
- `scroll` tool exposes four directions: `up` / `down` (one viewport each)
  and `top` / `bottom` (jump to start / end of document).
- Updates the persona prompt's "if expected data isn't visible" guidance
  to point at the new `scroll()` action instead of describing a missing
  capability.

Split out of the hygiene cleanup PR #444 (item F): the issue said either
land scroll then keep the guidance, or strip the guidance until scroll
lands. This PR takes the "land scroll" path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… before next snapshot

Scroll is most useful on dynamic pages where IntersectionObserver-driven
loads append content as the viewport moves. Without a settle, the next
aria-tree snapshot can capture the page mid-load and miss exactly what
the scroll was supposed to reveal — leaving the agent to scroll again
without ever seeing the new content.

PlaywrightBrowser waits up to 500ms for networkidle (caught so the
timeout doesn't fail the action). ExtensionBrowser has no networkidle
equivalent, so it uses a fixed 300ms wait as the closest mirror.

Both timeouts are tight so scroll stays cheap on quiet pages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant