Skip to content

Mitigate PostgreSQL connection pool contamination#3862

Draft
robstradling wants to merge 4 commits intogoogle:masterfrom
robstradling:mitigate_postgres_connection_pool_contamination
Draft

Mitigate PostgreSQL connection pool contamination#3862
robstradling wants to merge 4 commits intogoogle:masterfrom
robstradling:mitigate_postgres_connection_pool_contamination

Conversation

@robstradling
Copy link
Copy Markdown
Contributor

When a gRPC context is cancelled (client disconnect or timeout) while a database query is in-flight, pgx closes the underlying connection or leaves it in a dirty transaction state ('T' or 'E'), but the connection pool may then recycle this connection and assign it to a subsequent query, causing "failed to deallocate cached statement(s): conn closed" errors. This pgx behaviour is described by jackc/pgx#2100.

Attempt to work around this problem by registering PrepareConn and AfterRelease callbacks on the pgxpool configuration that check connection health before acquisition and after release. Connections that are closed or not in idle transaction state ('I') are destroyed rather than recycled.

Checklist

When a gRPC context is cancelled (client disconnect or timeout) while a database query is in-flight, pgx closes the underlying connection or leaves it in a dirty transaction state ('T' or 'E'), but the connection pool may then recycle this connection and assign it to a subsequent query, causing "failed to deallocate cached statement(s): conn closed" errors. This pgx behaviour is described by jackc/pgx#2100.

Attempt to work around this problem by registering PrepareConn and AfterRelease callbacks on the pgxpool configuration that check connection health before acquisition and after release. Connections that are closed or not in idle transaction state ('I') are destroyed rather than recycled.
…ation

rollbackInternal(): Check tx.Conn().IsClosed() before attempting rollback.
If the connection is already dead, the server has discarded the transaction,
so attempting rollback is a noisy no-op.

Commit(): Check tx.Conn().IsClosed() before attempting commit, returning an
error immediately if the connection is dead. Use context.WithoutCancel()
(with a 30s timeout) for storeSubtrees() and tx.Commit() to prevent a
concurrent context cancellation from interrupting the commit mid-flight and
triggering the deallocation error.
@robstradling robstradling force-pushed the mitigate_postgres_connection_pool_contamination branch from 231dc9f to 06a6e0a Compare March 12, 2026 20:21
storage/postgresql/admin_storage.go:
- adminTX.Commit(): Check tx.Conn().IsClosed() before attempting commit.
- adminTX.Close(): Check tx.Conn().IsClosed() before attempting rollback.

storage/postgresql/log_storage.go:
- Downgrade "Failed to queue leaf", "Query() ... hash", "rows.Err()",
and "Failed to read returned leaves" warnings to klog.V(1) when the
context is already cancelled, as these are expected under normal gRPC
client timeout/disconnect scenarios.

storage/postgresql/tree_storage.go:
- beginTreeTx(): Download "Could not start tree TX" warning to klog.V(1)
when the context is already cancelled, as these are expected under
normal gRPC client timeout/disconnect scenarios.
storage/postgresql/log_storage.go:

getLeavesByRangeInternal(): Downgrade "Failed to get leaves by
range" and deferred "rows.Err()" to klog.V(1) when ctx.Err() != nil.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants