fix(client): unref reconnect/connection timers so leaked test clients can't hang Build·Test·Lint by ivkan · Pull Request #42 · TopGunBuild/topgun

ivkan · 2026-06-10T07:50:35Z

Summary

Fixes the Build · Test · Lint (Unit tests) job hanging on main and being cancelled
at its 18-minute cap — the regression exposed once the prior CI-reliability change removed
--forceExit.

Root cause (precise)

packages/client/src/__tests__/ORMapPersistence.test.ts creates two TopGunClients inside
test bodies (the "restore … from storage" cases) that are never closed. The file mocks
WebSocket (fires onopen), so each leaked client connects and starts a heartbeat. The
suite's afterAll then restores the real WebSocket — so when the leaked clients'
heartbeat fails (the mock never pongs), they drop into a reconnect loop that now hits real
undici (ws://localhost:1234), keeping Jest's worker alive past the last expect().

On CI, with --forceExit removed, this hung the client package's jest process — the log
shows Jest did not exit one second after the test run has completed immediately after
Tests: 613 passed — which blocks pnpm -r from advancing to the next package, so the job
runs to the 18-min timeout-minutes cap and is cancelled. The two leaked clients exactly
match the two parallel reconnect sequences in the CI log.

SingleServerProvider.close() is correct (it clears the reconnect/connection timers and
rejects the in-flight connect promise) — the defect is purely that these two test clients are
never disposed.

Fix

Track both in-test clients and dispose them in afterEach (same pattern already used in
TopGunClient.test.ts).

Verification

npx jest --runInBand in packages/client, without --forceExit:

EXIT=0 (clean process exit, not a 124 timeout)
0 "Jest did not exit" warnings
613 passed in ~20s

Before: jest hung indefinitely on the reconnect loop. Closes the open AC1 from the prior
CI-reliability spec (observe a real green Build·Test·Lint run).

Two TopGunClients created in ORMapPersistence test bodies (the restore-from- storage cases) were never closed. The mocked WebSocket fires onopen so they start a heartbeat; once the suite's afterAll restores the real WebSocket, the leaked clients' heartbeat failure drops into a reconnect loop against real undici (ws://localhost:1234), keeping Jest's worker alive past the last expect(). On CI (no --forceExit after the prior CI-reliability change) this hung the client package's jest process — "Jest did not exit" right after 613 passed — which blocked `pnpm -r` from advancing and got Build·Test·Lint cancelled at the 18-min cap. Two leaked clients matched the two parallel reconnect sequences in the logs. Track both in-test clients and dispose them in afterEach. Client suite now exits cleanly (exit 0, no "Jest did not exit", 613 passed in ~20s) without --forceExit.

cloudflare-workers-and-pages · 2026-06-10T07:51:15Z

Deploying topgun with Cloudflare Pages

Latest commit:	`0d1cbda`
Status:	✅ Deploy successful!
Preview URL:	https://28e36bc0.topgun-f45.pages.dev
Branch Preview URL:	https://fix-sf-451-client-test-recon.topgun-f45.pages.dev

View logs

…t hang jest The Build·Test·Lint job hung because test-created TopGunClients that are never close()d leave SingleServerProvider's reconnect + connection-timeout setTimeouts referenced on the event loop, so the jest worker never exits (masked previously by --forceExit). The ORMapPersistence fix closed two such clients, but the class recurs in any suite that forgets close() (detectOpenHandles flagged more in adapter-better-auth). Systematic fix at the source: unref() both background timers. They are background machinery and must never be the sole reason a process/Jest worker stays alive. unref() is a no-op in browsers (setTimeout returns a number), so behavior there is unchanged; in Node a pending reconnect/timeout no longer blocks process exit. Also dispose the clients started in adapter-better-auth's BetterAuth integration suite (afterAll/afterEach) as accompanying hygiene. Verified: full `pnpm -r exec jest --runInBand` exits cleanly (FULL_EXIT=0) with ZERO "Jest did not exit" warnings across all 10 packages, without --forceExit.

ivkan · 2026-06-10T08:24:42Z

Updated: root cause is broader than the initial two clients. Added the systematic source fix — unref() on SingleServerProvider's reconnect + connection-timeout timers — so any unclosed test client can no longer keep the Jest worker alive. Verified locally: full pnpm -r exec jest --runInBand exits cleanly (FULL_EXIT=0) with 0 "Jest did not exit" across all 10 packages, no --forceExit.

…est hang Revert after identification. Prints a marker per pnpm -r package and bounds each with a hard 240s timeout so the hanging package (which does not reproduce locally) names itself and fails fast instead of cancelling the whole job at the 18m cap.

…hang CI The "should NOT throw when --admin admin source is absent" test ran `topgun dev` via execSync without a TOPGUN_SERVER_BINARY override and without a timeout. The server binary is resolved from the installed @topgunbuild/server package, not from cwd, so on CI (where the binary is present) `dev` booted a real foreground server that never returns — execSync hung forever, hanging the cli package and cancelling Build·Test·Lint at the 18-minute cap. It passed locally only because binary resolution from the temp cwd happened to miss. Force TOPGUN_SERVER_BINARY=/nonexistent (matching the sibling test) plus a 30s execSync timeout so the "binary not found → exit 1" path is deterministic on every platform. Also reverts the temporary per-package CI diagnostic that pinpointed this.

doctor exits non-zero when an optional dependency is absent (the Rust toolchain is not installed on the Node CI runner). The raw execSync threw before any assertion ran, so the three doctor tests failed — latent failures never seen before because the suite previously hung on the dev test (alphabetically before doctor) and, before that, --forceExit masked everything. Use the runCli helper (captures stdout/stderr + exit code without throwing) and assert on combined output, which still contains the full environment-check report.

A prettier violation in the SPEC-299 close() change reached main unformatted — the Build·Test·Lint job never ran format:check before because it hung on the cli dev test (now fixed). Whitespace-only (method-chain wrap); no behavior change.

ivkan changed the title ~~fix(client): close leaked test clients to stop Build·Test·Lint jest hang~~ fix(client): unref reconnect/connection timers so leaked test clients can't hang Build·Test·Lint Jun 10, 2026

ivkan added 4 commits June 10, 2026 12:02

ivkan merged commit d5aa579 into main Jun 10, 2026
6 checks passed

ivkan deleted the fix/sf-451-client-test-reconnect-leak branch June 10, 2026 09:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(client): unref reconnect/connection timers so leaked test clients can't hang Build·Test·Lint#42

fix(client): unref reconnect/connection timers so leaked test clients can't hang Build·Test·Lint#42
ivkan merged 6 commits into
mainfrom
fix/sf-451-client-test-reconnect-leak

ivkan commented Jun 10, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

ivkan commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ivkan commented Jun 10, 2026

Summary

Root cause (precise)

Fix

Verification

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying topgun with Cloudflare Pages

Uh oh!

ivkan commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Jun 10, 2026 •

edited

Loading