Mitigate PostgreSQL connection pool contamination#3862
Draft
robstradling wants to merge 4 commits intogoogle:masterfrom
Draft
Mitigate PostgreSQL connection pool contamination#3862robstradling wants to merge 4 commits intogoogle:masterfrom
robstradling wants to merge 4 commits intogoogle:masterfrom
Conversation
When a gRPC context is cancelled (client disconnect or timeout) while a database query is in-flight, pgx closes the underlying connection or leaves it in a dirty transaction state ('T' or 'E'), but the connection pool may then recycle this connection and assign it to a subsequent query, causing "failed to deallocate cached statement(s): conn closed" errors. This pgx behaviour is described by jackc/pgx#2100.
Attempt to work around this problem by registering PrepareConn and AfterRelease callbacks on the pgxpool configuration that check connection health before acquisition and after release. Connections that are closed or not in idle transaction state ('I') are destroyed rather than recycled.
…ation rollbackInternal(): Check tx.Conn().IsClosed() before attempting rollback. If the connection is already dead, the server has discarded the transaction, so attempting rollback is a noisy no-op. Commit(): Check tx.Conn().IsClosed() before attempting commit, returning an error immediately if the connection is dead. Use context.WithoutCancel() (with a 30s timeout) for storeSubtrees() and tx.Commit() to prevent a concurrent context cancellation from interrupting the commit mid-flight and triggering the deallocation error.
231dc9f to
06a6e0a
Compare
storage/postgresql/admin_storage.go: - adminTX.Commit(): Check tx.Conn().IsClosed() before attempting commit. - adminTX.Close(): Check tx.Conn().IsClosed() before attempting rollback. storage/postgresql/log_storage.go: - Downgrade "Failed to queue leaf", "Query() ... hash", "rows.Err()", and "Failed to read returned leaves" warnings to klog.V(1) when the context is already cancelled, as these are expected under normal gRPC client timeout/disconnect scenarios. storage/postgresql/tree_storage.go: - beginTreeTx(): Download "Could not start tree TX" warning to klog.V(1) when the context is already cancelled, as these are expected under normal gRPC client timeout/disconnect scenarios.
storage/postgresql/log_storage.go: getLeavesByRangeInternal(): Downgrade "Failed to get leaves by range" and deferred "rows.Err()" to klog.V(1) when ctx.Err() != nil.
dhggsf
approved these changes
Mar 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a gRPC context is cancelled (client disconnect or timeout) while a database query is in-flight, pgx closes the underlying connection or leaves it in a dirty transaction state ('T' or 'E'), but the connection pool may then recycle this connection and assign it to a subsequent query, causing "failed to deallocate cached statement(s): conn closed" errors. This pgx behaviour is described by jackc/pgx#2100.
Attempt to work around this problem by registering PrepareConn and AfterRelease callbacks on the pgxpool configuration that check connection health before acquisition and after release. Connections that are closed or not in idle transaction state ('I') are destroyed rather than recycled.
Checklist