Skip to content

chore(core): hygiene cleanup from issue #431#444

Open
lmorchard wants to merge 1 commit into
mainfrom
worktree-fix-431-hygiene-cleanup
Open

chore(core): hygiene cleanup from issue #431#444
lmorchard wants to merge 1 commit into
mainfrom
worktree-fix-431-hygiene-cleanup

Conversation

@lmorchard
Copy link
Copy Markdown
Collaborator

@lmorchard lmorchard commented May 13, 2026

Summary

Bundle the items from #431 that still apply to current main. Each touches a small, focused area of packages/core.

  • A. Warn + emit SYSTEM_DEBUG_TOOL_DROP when providers return more than one tool call in a single turn, so dropped extras are observable rather than silently lost.
  • B. Replace the string-match isSetupError check with PlanningError / NoStartingUrlError subclasses (re-exported from the public API), so setup-error detection survives refactors of the underlying message text.
  • D. Bump the wait tool's upper bound from 30s to 120s and rewrite its execute to sleep directly with abort-signal polling instead of going through page.waitForTimeout (which is abort-blind). A 120s wait now responds to user aborts within ~500ms.
  • E. Drop the unused actionLoopSystemPrompt export; refactor the prompts tests to call buildActionLoopSystemPrompt(false, false) directly.

Item C (harden image-strip detection) was a no-op: the fallback the issue refers to (webAgent.ts:542-552) is not present in the current main lineage. git log -S "stripping images" shows it was introduced on PR #378 but didn't reach current main after the developmain workflow switchover (PR #337). Nothing to harden.

Item F (persona prompt scroll guidance) is deferred to a follow-up PR that adds a real scroll tool — so the prompt text becomes accurate rather than stripped. Keeping that work separate so it gets its own review/changelog entry rather than being buried under a "hygiene" label.

Test plan

  • pnpm run check passes (typecheck + format + 1247 tests across core/cli/server/extension)
  • New tests assert:
    • SYSTEM_DEBUG_TOOL_DROP event payload (droppedTools / keptTool)
    • Planning failures reject with a PlanningError instance
    • wait cap accepts 120s, rejects 121s
    • wait aborts within one poll interval when abortSignal fires
  • Generated schema (schemas/webagent-event.json) regenerated for the new event
  • gitleaks protect --staged clean

Resolves #431

@lmorchard lmorchard requested a review from Copilot May 13, 2026 20:50
@lmorchard lmorchard force-pushed the worktree-fix-431-hygiene-cleanup branch 2 times, most recently from 3a1ee57 to 2959bee Compare May 13, 2026 20:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR performs focused core hygiene cleanup around tool-call observability, setup error typing, wait-tool limits, and prompt/export cleanup.

Changes:

  • Adds typed planning/setup errors and a multi-tool-drop debug event.
  • Increases the wait tool cap to 120 seconds and updates prompts/tests.
  • Removes the dead actionLoopSystemPrompt export and refreshes related tests/docs.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/core/src/webAgent.ts Emits multi-tool-drop diagnostics and uses typed setup errors.
packages/core/src/utils/retry.ts Adds tool-required diagnostics and retry prompt augmentation.
packages/core/src/tools/webActionTools.ts Raises wait tool validation limit to 120 seconds.
packages/core/src/prompts.ts Updates wait descriptions and removes scroll guidance/dead export.
packages/core/src/events.ts Adds the SYSTEM_DEBUG_TOOL_DROP event and payload type.
packages/core/src/errors.ts Adds PlanningError and NoStartingUrlError.
packages/core/src/core.ts Re-exports the new error classes.
packages/core/test/webAgent.test.ts Adds tests for planning error typing and multi-tool-drop events.
packages/core/test/utils/retry.test.ts Adds tests for tool-required retry diagnostics/augmentation.
packages/core/test/tools/webActionTools.test.ts Updates wait tool description/schema tests.
packages/core/test/prompts.test.ts Refactors prompt tests after removing the exported constant.
packages/core/test/events.test.ts Updates expected event type list.
docs/dev-sessions/2026-05-13-1200-hygiene-cleanup-431/notes.md Adds implementation notes for issue #431 cleanup.
Comments suppressed due to low confidence (1)

packages/core/src/tools/webActionTools.ts:197

  • Raising the wait limit to 120s makes aborts much less responsive because the wait action ultimately calls page.waitForTimeout(seconds * 1000) and neither this tool nor the browser wait path observes context.abortSignal. A user abort during wait({ seconds: 120 }) can be delayed until the full wait finishes; make the wait abort-aware before increasing the cap this far.
      inputSchema: z.object({

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/core/src/errors.ts Outdated
Comment thread docs/dev-sessions/2026-05-13-1200-hygiene-cleanup-431/notes.md Outdated
@lmorchard lmorchard force-pushed the worktree-fix-431-hygiene-cleanup branch from 2959bee to 44986e6 Compare May 13, 2026 21:07
@lmorchard
Copy link
Copy Markdown
Collaborator Author

Thanks for the review — addressed all three points:

1. Wait + abort responsiveness (suppressed comment on webActionTools.ts:197): Real concern, fixed. The wait tool no longer routes through browser.performActionpage.waitForTimeout (which is abort-blind). It now does its own 500ms-polling sleep loop that checks context.abortSignal each iteration, so user aborts during a wait({seconds: 120}) are responsive within ~500ms. New test in webActionTools.test.ts exercises the abort path.

2. NoStartingUrlError doc (errors.ts:115): Fixed. The docstring now describes the actual situation (defensive guard against an unexpectedly unset this.url after planTask should have defaulted to about:blank) rather than the user-facing scenario it implied.

3. notes.md:26 stale comment: Fixed. Updated the line-288 entry to match what the diff actually does (rewrites the Best-Practices line so it points at wait() rather than the non-existent scroll action).

@lmorchard lmorchard force-pushed the worktree-fix-431-hygiene-cleanup branch 3 times, most recently from 0eaf00a to 1d16fad Compare May 13, 2026 21:43
Bundle the items from #431 that still apply to current main:

- A. Warn + emit SYSTEM_DEBUG_TOOL_DROP when providers return more than
  one tool call in a single turn, so dropped extras are observable
  rather than silently lost.
- B. Replace the string-match `isSetupError` check with `PlanningError`
  and `NoStartingUrlError` subclasses (re-exported from the public API),
  so setup-error detection survives refactors of the underlying message
  text.
- D. Bump the `wait` tool's upper bound from 30s to 120s and rewrite
  its `execute` to sleep directly with abort-signal polling instead of
  going through `page.waitForTimeout` (which is abort-blind), so a 120s
  wait can be interrupted by a user abort within ~500ms.
- E. Drop the unused `actionLoopSystemPrompt` export; refactor the
  prompts tests to call `buildActionLoopSystemPrompt(false, false)`
  directly.

Item C (harden image-strip detection) was a no-op: the fallback the
issue refers to is not present in the current main lineage, so there is
nothing to harden.

Item F (persona prompt scroll guidance) is deferred to a follow-up PR
that adds a real `scroll` tool, so the prompt text becomes accurate
rather than being stripped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lmorchard lmorchard force-pushed the worktree-fix-431-hygiene-cleanup branch from 1d16fad to 3d5ba09 Compare May 13, 2026 21:43
@lmorchard lmorchard mentioned this pull request May 13, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hygiene cleanup — multi-tool drop logging, error subclasses, wait cap, dead exports

2 participants