Skip to content

Add optional server-side player keepalive to reap dead connections#912

Merged
mcottontensor merged 1 commit into
EpicGames:masterfrom
marekl11:fix/server-player-keepalive
Jun 19, 2026
Merged

Add optional server-side player keepalive to reap dead connections#912
mcottontensor merged 1 commit into
EpicGames:masterfrom
marekl11:fix/server-player-keepalive

Conversation

@marekl11

Copy link
Copy Markdown
Contributor

Relevant components:

  • Signalling server
  • Common library
  • Frontend library
  • Frontend UI library
  • Matchmaker
  • Platform scripts
  • SFU

Problem statement:

The signalling server never actively checks that a connected player is still alive. KeepaliveMonitor is only wired up on the client (browser to server); the server just replies to pings, it doesn't send its own. A player is only removed from the registry when its WebSocket fires close.

That's fine when a player disconnects cleanly, but if the socket dies without a close frame — laptop going to sleep, Wi-Fi dropping, a tunnel resetting, the tab being killed — the server never hears about it. The dead player stays in the registry and stays subscribed to its streamer until the OS TCP keepalive eventually reaps the socket, which on Windows defaults to around two hours (and ws doesn't enable it).

With the default --max_players 0 (unlimited) this is mostly harmless, which is probably why it hasn't come up. But as soon as you cap subscribers — --max_players 1 for a single-viewer experience — that ghost holds the only slot, and every subsequent viewer is turned away with "Max players reached" until the server is restarted.

Solution

Add the missing server-to-browser half of the keepalive. When a player connects, the server can now start a KeepaliveMonitor on that connection (the same class already used client-side); the browser answers these pings automatically via handlePingMessage, so no frontend change is needed. If a player misses the keepalive, the server calls ws.terminate() — a forceful close rather than a graceful one, because a dead peer never completes the close handshake — which fires close immediately and frees the slot.

It's controlled by a new IServerConfig.playerKeepaliveTimeout (milliseconds, 0 disables it). The signalling web server exposes it as --player_keepalive_timeout. I've defaulted it to 30000 (30s): reaping a genuinely dead connection is a correctness improvement that I think is worth having on by default, but it's fully tunable and can be turned off. A live browser always responds to the ping, so a real viewer is never evicted; only a connection that has gone completely silent is. The monitor stops itself on transport close, so there's no extra teardown.

Documentation

SignallingWebServer/README.md is updated with the new option, and the config field and the logic both have explanatory comments.

Test Plan and Compatibility

npm run build and npm run lint pass for Common, Signalling and SignallingWebServer. I tested by connecting a viewer and then dropping it ungracefully (killing the browser process / sleeping the machine / pulling Wi-Fi) — the server logs the keepalive failure and frees the slot within roughly one to two timeout intervals, and a new viewer can connect. The clean paths (normal disconnect, and a long idle-but-alive session) behave exactly as before and never evict a live viewer. Setting --player_keepalive_timeout 0 restores the previous behaviour exactly.

@mcottontensor mcottontensor merged commit 111007d into EpicGames:master Jun 19, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants