Fix connection pool lifecycle to stop leaking connections#177
Open
jens wants to merge 1 commit into
Open
Conversation
The stdio server could outlive its client and hold pooled connections indefinitely. A client normally signals shutdown by closing stdin, which makes run_stdio_async() return. But if the client is force-killed, stdin EOF never arrives, run_stdio_async() never returns, and the process is reparented to init — lingering forever. With min_size=1 each such process pins at least one connection, so a few orphans exhaust a low per-role connection limit, after which every new server fails with "too many connections for role". - Pool: min_size=0 + max_idle=60 so an idle server drifts back to zero connections; lower max_size 5 -> 3. - stdio: run a lightweight watchdog alongside the transport that detects orphaning (parent reparented to PID 1), closes the pool, and exits. The transport coroutine can't be cancelled cleanly (it blocks in a stdin reader thread), so the watchdog releases the pool and hard-exits. - Release the pool in a finally on every clean exit path (stdin EOF, signal, or error), and stop the watchdog there. Add unit tests for the watchdog behavior and per-transport wiring.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The stdio server can outlive its client and hold pooled connections indefinitely, eventually exhausting the database role's connection limit. Once that happens, every newly started server fails to connect with
FATAL: too many connections for role "...".This is the leak described in #98.
Root cause
An MCP client signals shutdown by closing the server's stdin, which makes
run_stdio_async()return. But if the client is force-killed (crash,kill -9, or a supervisor tearing down the process tree), stdin EOF never arrives —run_stdio_async()never returns and the process is reparented to init, lingering indefinitely.Each lingering process keeps its pool open. With
min_size=1the pool pins at least one connection for the whole life of the process, so a handful of orphaned servers is enough to exhaust a modest per-roleCONNECTION LIMIT.Fix
min_size=0+max_idle=60, so an idle server holds no connections and drifts back to zero between queries;max_sizelowered 5 → 3.finallyon every clean exit path (stdin EOF, signal, error), and the watchdog is cancelled there.Either change alone reduces the impact; together, an idle or orphaned server holds zero connections, and an orphaned one exits within a couple of seconds.
Tests
tests/unit/test_lifecycle.pycovers: the watchdog closes the pool before exiting, waits until actually orphaned, is started exactly once and cancelled on clean stdio exit, and is never started for thesse/streamable-httptransports.Notes
Orphan detection keys on reparenting to PID 1, the common case on Linux and macOS. Under a PID namespace or a custom subreaper the parent may differ; there the watchdog simply doesn't fire, and the
min_size=0change still bounds the leak.Fixes #98