Skip to content

fix(trade): resolve WebSocket connection-pool memory leak (#158)#178

Open
Itodo-S wants to merge 2 commits into
BETAIL-BOYS:mainfrom
Itodo-S:fix/websocket-memory-leak-issue-158
Open

fix(trade): resolve WebSocket connection-pool memory leak (#158)#178
Itodo-S wants to merge 2 commits into
BETAIL-BOYS:mainfrom
Itodo-S:fix/websocket-memory-leak-issue-158

Conversation

@Itodo-S

@Itodo-S Itodo-S commented Jun 23, 2026

Copy link
Copy Markdown

Summary

Closes #158.

The TradeGateway broadcast trade updates to clients but never tracked connections or tore them down. Dropped/ghost sockets (and the listeners closing over them) stayed referenced on the heap, which is the gradual leak that triggers the ~48h OOM under connection churn.

Changes (src/trade/trade.gateway.ts)

  • Explicit connection pool — every socket is tracked in a Map (keyed by id) on connect via OnGatewayConnection, and the entry — and all references to the socket — is deleted on disconnect via OnGatewayDisconnect.
  • Per-socket heartbeat — a 30s interval that forcibly disconnect(true)s sockets that are no longer connected or stop proving liveness. Liveness is refreshed from engine-level pongs (no client changes required) plus an optional app-level heartbeat:pong.
  • Protocol-level ping/pongpingInterval: 25s / pingTimeout: 20s so dead transports are terminated by engine.io itself.
  • Deterministic cleanup — on disconnect the per-socket interval is cleared with clearInterval and every attached listener is detached (packet / heartbeat:pong / error); onModuleDestroy clears all remaining timers.
  • getConnectionCount() exposed for health checks / leak assertions.
  • Fixed a latent bad import (OnModuleInit comes from @nestjs/common, not @nestjs/websockets).

Acceptance criteria

  • Audit connection manager / listenersclose(disconnect)/error paths now remove the socket from the pool and detach listeners; nothing keeps a dropped socket alive.
  • 30s ping/pong heartbeat — engine.io ping/pong + an app-level per-socket heartbeat that terminates ghosts.
  • Per-socket interval cleared on disconnectclearInterval in handleDisconnect (and onModuleDestroy).
  • 500-client verificationsrc/trade/trade.gateway.spec.ts churns 500 connect/disconnect clients and asserts the pool returns to 0 (no leaked references). scripts/ws-load-test.js reproduces this against a running server with heap/RSS logging (--expose-gc) to confirm memory plateaus.

Testing

$ jest src/trade/trade.gateway.spec.ts
Tests: 9 passed, 9 total

Covers: connect/disconnect cleanup, heartbeat ping, responsive client stays alive, ghost + dead-transport termination, listener detachment, timer clearing on destroy, and the 500-client churn leak check.

Note: the repository's npm run build / CI is already red on main due to unrelated pre-existing issues (missing express-rate-limit dependency, missing ./invoices.controller and ./pdf.service in dynamic.module.ts, and a LogLevel type error in custom.logger.ts). None of those files are touched here; this change compiles cleanly under ts-jest and its test suite passes. Happy to split a follow-up to restore the build if useful.

…ETAIL-BOYS#158)

The TradeGateway broadcast to clients but never tracked or tore them down, so
dropped/ghost sockets and their listeners were retained on the heap — the source
of the gradual OOM under connection churn.

Changes:
- Track every connection in an explicit pool (Map keyed by socket id) via
  OnGatewayConnection/OnGatewayDisconnect; the entry (and all references to the
  socket) is deleted on disconnect.
- Add a per-socket heartbeat: a 30s interval that forcibly disconnects sockets
  that are no longer connected or stop proving liveness. Liveness is refreshed
  from engine-level pongs (no client changes needed) and an optional app-level
  'heartbeat:pong'.
- Enable engine.io ping/pong (pingInterval 25s / pingTimeout 20s) so dead
  transports are terminated by the protocol itself.
- On disconnect, clear the per-socket interval with clearInterval and detach
  every listener that was attached (packet/heartbeat:pong/error); clear all
  timers on module destroy.
- Expose getConnectionCount() for health checks and leak assertions.
- Fix a latent bad import: OnModuleInit comes from @nestjs/common, not
  @nestjs/websockets.

Verification:
- src/trade/trade.gateway.spec.ts: 9 tests covering connect/disconnect cleanup,
  heartbeat ping, ghost/dead-transport termination, listener detachment, timer
  clearing on destroy, and a 500-client connect/disconnect churn that leaves the
  pool empty (no leaked references).
- scripts/ws-load-test.js: churns 500 clients x N rounds against a running
  server and prints rss/heapUsed each round (with --expose-gc) to confirm memory
  plateaus rather than growing.
@AlAfiz

AlAfiz commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@Itodo-S CI build and test failed please fix it

CI failed at `npm ci` and the build/test steps could not run. Resolve the
pre-existing breakages that block the pipeline:

- Align @nestjs/websockets and @nestjs/platform-socket.io to ^10.4.22 (matching
  the rest of @nestjs). The pinned ^10.0.0 peers reflect-metadata@^0.1.12, which
  conflicts with the project's reflect-metadata@^0.2.2 and made `npm ci` fail
  with ERESOLVE. Regenerated package-lock.json so the install resolves cleanly.
- Fix InvoicesModule (src/dynamic/dynamic.module.ts): it imported
  ./invoices.controller and ./pdf.service, which do not exist. Point it at the
  classes that do (NetworkController, PdfService in ./dynamic.serivice).
- Type CustomLogger.formatMessage's logLevel param as LogLevel (not string) so
  the override matches ConsoleLogger and compiles.
- Fix the pools apy-history integration test: the endpoint returns the
  ApyHistoryPoint[] array directly (per its return type / Swagger isArray), and
  there is no global response-envelope interceptor, so assert the array shape
  instead of a {status,data} envelope.

After this: `npm ci`, `npm run build`, and `npm test` (17/17) all pass.
@Itodo-S

Itodo-S commented Jun 24, 2026

Copy link
Copy Markdown
Author

Thanks @AlAfiz — fixed and pushed. The pipeline was failing at npm ci (so build/test never ran). Root causes were pre-existing and repo-wide; I resolved them so all three steps pass:

  • npm ci ERESOLVE@nestjs/websockets/@nestjs/platform-socket.io were pinned to ^10.0.0, which peers reflect-metadata@^0.1.12 and conflicts with the project's reflect-metadata@^0.2.2. Bumped both to ^10.4.22 (matching the rest of @nestjs) and regenerated package-lock.jsonnpm ci now resolves cleanly without --legacy-peer-deps.
  • build errorsrc/dynamic/dynamic.module.ts imported ./invoices.controller and ./pdf.service, which don't exist; pointed it at the classes that do (NetworkController, and PdfService in ./dynamic.serivice).
  • build error — typed CustomLogger.formatMessage's logLevel as LogLevel instead of string so the ConsoleLogger override compiles.
  • test failure — the pools apy-history integration test asserted a {status,data} envelope, but the endpoint returns the ApyHistoryPoint[] array directly (per its return type / Swagger) and there's no global response interceptor; updated the test to the actual contract.

Verified locally:

npm ci          # exit 0
npm run build   # exit 0
npm test        # 17/17 passing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resolve Memory Leak in WebSocket Connection Pool

2 participants