feat(core): add scroll tool#445
Open
lmorchard wants to merge 2 commits into
Open
Conversation
Add `scroll({direction})` as a first-class browser action so the agent has
a way to reveal content on dynamic / infinite-scroll pages instead of
being stuck when the accessibility tree appears incomplete.
- `PageAction.Scroll` in `ariaBrowser.ts`, with `ScrollDirection` type and
a `SCROLL_DIRECTIONS` constant the Zod schema reuses.
- PlaywrightBrowser: scroll case using `page.evaluate` →
`window.scrollBy` / `window.scrollTo` so it works in remote-CDP setups.
- ExtensionBrowser: matching scroll case via
`browser.scripting.executeScript` for parity with the extension runtime.
- `scroll` tool exposes four directions: `up` / `down` (one viewport each)
and `top` / `bottom` (jump to start / end of document).
- Updates the persona prompt's "if expected data isn't visible" guidance
to point at the new `scroll()` action instead of describing a missing
capability.
Split out of the hygiene cleanup PR #444 (item F): the issue said either
land scroll then keep the guidance, or strip the guidance until scroll
lands. This PR takes the "land scroll" path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… before next snapshot Scroll is most useful on dynamic pages where IntersectionObserver-driven loads append content as the viewport moves. Without a settle, the next aria-tree snapshot can capture the page mid-load and miss exactly what the scroll was supposed to reveal — leaving the agent to scroll again without ever seeing the new content. PlaywrightBrowser waits up to 500ms for networkidle (caught so the timeout doesn't fail the action). ExtensionBrowser has no networkidle equivalent, so it uses a fixed 300ms wait as the closest mirror. Both timeouts are tight so scroll stays cheap on quiet pages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a
scrolltool so the agent has a real way to reveal content on dynamic / infinite-scroll pages, instead of being stuck when the accessibility tree appears incomplete.PageAction.ScrollinariaBrowser.ts, with aScrollDirectiontype ("up" | "down" | "top" | "bottom") and aSCROLL_DIRECTIONSconstant the Zod schema reuses.page.evaluate(direction => window.scrollBy / window.scrollTo)so it works in remote-CDP setups.browser.scripting.executeScriptfor the same effect — parity across runtimes.scrolltool:scroll({direction})with four values:down/up— scroll one viewport in that directiontop/bottom— jump to the start / end of the documentscroll()action instead of describing a missing capability.Split out of #444 (item F of issue #431). The issue said either land scroll first and keep the prompt guidance, or strip the guidance until scroll lands. This PR takes the "land scroll" path so item F's prompt-text-still-misleading footprint in #444 disappears once both merge.
Test plan
pnpm run checkpasses (typecheck + format + 1249 tests across core/cli/server/extension)scrolltool dispatchesPageAction.Scrollwith each of the four directions"left","",{})PlaywrightBrowsercallspage.evaluatewith the directionPlaywrightBrowserthrowsBrowserActionExceptionwhen direction is missingPageActionenum + non-element categorization lists includeScrollgitleaks protect --stagedclean🤖 Generated with Claude Code