fix(client): unref reconnect/connection timers so leaked test clients can't hang Build·Test·Lint#42
Merged
Merged
Conversation
Two TopGunClients created in ORMapPersistence test bodies (the restore-from- storage cases) were never closed. The mocked WebSocket fires onopen so they start a heartbeat; once the suite's afterAll restores the real WebSocket, the leaked clients' heartbeat failure drops into a reconnect loop against real undici (ws://localhost:1234), keeping Jest's worker alive past the last expect(). On CI (no --forceExit after the prior CI-reliability change) this hung the client package's jest process — "Jest did not exit" right after 613 passed — which blocked `pnpm -r` from advancing and got Build·Test·Lint cancelled at the 18-min cap. Two leaked clients matched the two parallel reconnect sequences in the logs. Track both in-test clients and dispose them in afterEach. Client suite now exits cleanly (exit 0, no "Jest did not exit", 613 passed in ~20s) without --forceExit.
Deploying topgun with
|
| Latest commit: |
0d1cbda
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://28e36bc0.topgun-f45.pages.dev |
| Branch Preview URL: | https://fix-sf-451-client-test-recon.topgun-f45.pages.dev |
…t hang jest The Build·Test·Lint job hung because test-created TopGunClients that are never close()d leave SingleServerProvider's reconnect + connection-timeout setTimeouts referenced on the event loop, so the jest worker never exits (masked previously by --forceExit). The ORMapPersistence fix closed two such clients, but the class recurs in any suite that forgets close() (detectOpenHandles flagged more in adapter-better-auth). Systematic fix at the source: unref() both background timers. They are background machinery and must never be the sole reason a process/Jest worker stays alive. unref() is a no-op in browsers (setTimeout returns a number), so behavior there is unchanged; in Node a pending reconnect/timeout no longer blocks process exit. Also dispose the clients started in adapter-better-auth's BetterAuth integration suite (afterAll/afterEach) as accompanying hygiene. Verified: full `pnpm -r exec jest --runInBand` exits cleanly (FULL_EXIT=0) with ZERO "Jest did not exit" warnings across all 10 packages, without --forceExit.
Member
Author
|
Updated: root cause is broader than the initial two clients. Added the systematic source fix — |
…est hang Revert after identification. Prints a marker per pnpm -r package and bounds each with a hard 240s timeout so the hanging package (which does not reproduce locally) names itself and fails fast instead of cancelling the whole job at the 18m cap.
…hang CI The "should NOT throw when --admin admin source is absent" test ran `topgun dev` via execSync without a TOPGUN_SERVER_BINARY override and without a timeout. The server binary is resolved from the installed @topgunbuild/server package, not from cwd, so on CI (where the binary is present) `dev` booted a real foreground server that never returns — execSync hung forever, hanging the cli package and cancelling Build·Test·Lint at the 18-minute cap. It passed locally only because binary resolution from the temp cwd happened to miss. Force TOPGUN_SERVER_BINARY=/nonexistent (matching the sibling test) plus a 30s execSync timeout so the "binary not found → exit 1" path is deterministic on every platform. Also reverts the temporary per-package CI diagnostic that pinpointed this.
doctor exits non-zero when an optional dependency is absent (the Rust toolchain is not installed on the Node CI runner). The raw execSync threw before any assertion ran, so the three doctor tests failed — latent failures never seen before because the suite previously hung on the dev test (alphabetically before doctor) and, before that, --forceExit masked everything. Use the runCli helper (captures stdout/stderr + exit code without throwing) and assert on combined output, which still contains the full environment-check report.
A prettier violation in the SPEC-299 close() change reached main unformatted — the Build·Test·Lint job never ran format:check before because it hung on the cli dev test (now fixed). Whitespace-only (method-chain wrap); no behavior change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the
Build · Test · Lint(Unit tests) job hanging onmainand being cancelledat its 18-minute cap — the regression exposed once the prior CI-reliability change removed
--forceExit.Root cause (precise)
packages/client/src/__tests__/ORMapPersistence.test.tscreates twoTopGunClients insidetest bodies (the "restore … from storage" cases) that are never closed. The file mocks
WebSocket(firesonopen), so each leaked client connects and starts a heartbeat. Thesuite's
afterAllthen restores the realWebSocket— so when the leaked clients'heartbeat fails (the mock never pongs), they drop into a reconnect loop that now hits real
undici (
ws://localhost:1234), keeping Jest's worker alive past the lastexpect().On CI, with
--forceExitremoved, this hung the client package's jest process — the logshows
Jest did not exit one second after the test run has completedimmediately afterTests: 613 passed— which blockspnpm -rfrom advancing to the next package, so the jobruns to the 18-min
timeout-minutescap and is cancelled. The two leaked clients exactlymatch the two parallel reconnect sequences in the CI log.
SingleServerProvider.close()is correct (it clears the reconnect/connection timers andrejects the in-flight connect promise) — the defect is purely that these two test clients are
never disposed.
Fix
Track both in-test clients and dispose them in
afterEach(same pattern already used inTopGunClient.test.ts).Verification
npx jest --runInBandinpackages/client, without--forceExit:"Jest did not exit"warningsBefore: jest hung indefinitely on the reconnect loop. Closes the open AC1 from the prior
CI-reliability spec (observe a real green Build·Test·Lint run).