Skip to content

fix(browser): close orphaned browser before relaunch to stop headless_shell leak#6

Open
wrdo-dev wants to merge 1 commit into
DebugBase:mainfrom
wrdo-dev:fix/browser-process-leak-on-relaunch
Open

fix(browser): close orphaned browser before relaunch to stop headless_shell leak#6
wrdo-dev wants to merge 1 commit into
DebugBase:mainfrom
wrdo-dev:fix/browser-process-leak-on-relaunch

Conversation

@wrdo-dev

@wrdo-dev wrdo-dev commented Jun 2, 2026

Copy link
Copy Markdown

Problem

launchBrowser() in src/browser/manager.ts assigns _browser = await chromium.launch() without closing any existing browser first. The browser model is a single shared _browser singleton, but several paths abandon a live OS process:

  1. Desync relaunch: the disconnected handler sets _browser = null, but the underlying headless_shell process can still be alive (e.g. a transient disconnect, or the JS handle dropped while the process lingers). ensureBrowser() then sees !_browser and calls launchBrowser() again → a fresh Chromium spawns and the previous process is orphaned.
  2. Post-launch error: if chromium.launch() on the primary path succeeds but a later step (newContext() / newPage()) throws, control jumps to the catch block, which launches a second browser (system-Chrome fallback) and never closes the first.

Because nothing closes the abandoned process, idle headless_shell processes accumulate — roughly one browser cluster (~6 processes on Linux) per relaunch cycle — until the host runs out of memory and the OOM-killer fires.

Observed in production (v1.1.0): glance running as a child MCP accumulated ~7 orphaned browser clusters (40+ headless_shell, ~0.6 GB unique RSS) over ~90 minutes, all idle (<1% CPU), eventually triggering a kernel OOM that took down sibling processes.

Fix

  • Add a small discardBrowser() helper — a best-effort close() that swallows errors so a close failure can never block a relaunch.
  • launchBrowser() now closes the current browser (and clears shared state) before launching a new one.
  • The fallback (system-Chrome) path discards the partially-launched primary browser before its second chromium.launch().
  • The disconnected handler now close()s the browser (deterministic Playwright-side handle cleanup) and only clears the shared _browser/_context/_pages state if it still owns _browser — so a late disconnected event from a previously-replaced browser can't null out the current one.

The duplicated launch→context→page setup is factored into a shared attach() closure to keep the primary and fallback paths in sync.

Verification

Built with npm run build (esbuild, clean). On a host with chromium installed, drove the manager through repeated forced disconnect → relaunch cycles (SIGKILL the browser main process, then navigate again). With this change the headless_shell process count stays flat at one browser cluster across cycles; on main it grows by a full cluster each cycle.

No API or behavior change — same single-browser model, just without leaking the old process on replacement.

…_shell leak

launchBrowser() assigned `_browser = await chromium.launch()` without closing
any existing browser first. When the connection state desyncs from the OS
process — the `disconnected` handler nulls `_browser` while the headless_shell
process can still be alive, or a post-launch step (newContext/newPage) throws
and the catch-block launches a *second* browser — the previous browser process
was abandoned. ensureBrowser() then relaunches on the next tool call, so idle
headless_shell processes accumulate (~6 per relaunch cycle) until the host OOMs.

Fix:
- Add discardBrowser(): best-effort close that never blocks a relaunch.
- launchBrowser() closes the current browser before launching a new one.
- The fallback (system-Chrome) path discards the partially-launched primary
  browser before its second launch attempt.
- The `disconnected` handler now closes the browser (deterministic Playwright
  handle cleanup) and only clears shared state if it still owns `_browser`,
  so a stale event from a prior browser can't null the current one.

Verified on a host with chromium installed: two forced disconnect→relaunch
cycles hold the headless_shell process count flat instead of growing by ~6
each cycle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant