Skip to content

Add global query timeout to SELECT queries#4272

Merged
jmg-duarte merged 13 commits intomainfrom
jmgd/db-query-timeout
Apr 8, 2026
Merged

Add global query timeout to SELECT queries#4272
jmg-duarte merged 13 commits intomainfrom
jmgd/db-query-timeout

Conversation

@jmg-duarte
Copy link
Copy Markdown
Contributor

@jmg-duarte jmg-duarte commented Mar 18, 2026

Description

Following the issues we faced with RDS, and the fact that SQLx does not provide global timeouts, this PR uses PostgreSQL's native statement_timeout to limit the duration of read queries on the orderbook's read replica.

Instead of wrapping every query on the client side, the timeout is set once per connection via the pool's after_connect hook. This means every query issued through that pool is automatically subject to the configured timeout — no per-call changes needed.

The timeout is only applied to the orderbook read replica pool (created with Postgres::try_new_with_timeout). Write pools and other services are unaffected.

Changes

  • crates/orderbook/src/database/mod.rs — Added statement_timeout to Config and a new Postgres::try_new_with_timeout constructor that sets statement_timeout on every connection via after_connect. Includes a postgres integration test using pg_sleep.
  • crates/orderbook/src/run.rs — The read replica pool now uses try_new_with_timeout.
  • crates/configs/src/database.rs — Added statement_timeout field to DatabasePoolConfig (defaults to 30s, deserialized via humantime_serde).
  • crates/shared/src/arguments.rs — Added --statement-timeout CLI flag (used by refunder).
  • crates/database/src/trades.rs — Changed trades() from returning a BoxStream to fetch_all, since streaming doesn't play well with connection-level timeouts.
  • Minor whitespace cleanups in other crates/database/src/*.rs files and removal of an unrelated stale config section in the playground autopilot config.

How to test

Run the integration test against a local postgres (docker compose up -d first):

cargo nextest run -p orderbook statement_timeout --test-threads 1 --run-ignored ignored-only

@jmg-duarte jmg-duarte marked this pull request as ready for review March 19, 2026 16:22
@jmg-duarte jmg-duarte requested a review from a team as a code owner March 19, 2026 16:22
@jmg-duarte jmg-duarte changed the title PoC: DB query timeout Add global query timeout to SELECT queries Mar 19, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

Comment thread crates/database/src/timeout.rs Outdated
Comment thread scripts/eip712_order_hash.sh Outdated
@fafk
Copy link
Copy Markdown
Contributor

fafk commented Mar 20, 2026

WDYT about setting this within the connection instead?

let pool = PgPoolOptions::new()
    .max_connections(config.max_pool_size.get())
    .after_connect(|conn, _meta| Box::pin(async move {
        conn.execute("SET statement_timeout = '30s'").await?;
        Ok(())
    }))
    .connect(url)
    .await?;

EDIT: this caps the inserts and we don't want that, but we can apply that to the read-only replica pool and then it's fine

@squadgazzz
Copy link
Copy Markdown
Contributor

WDYT about setting this within the connection instead?

I was writing the same comment and only now noticed it. I think this PR is overkill.

@squadgazzz
Copy link
Copy Markdown
Contributor

squadgazzz commented Mar 31, 2026

A few things about the PR description vs the actual diff:

The description is out of date. It describes an application-level timeout approach (extension traits QueryAsTimeoutExt/QueryScalarTimeoutExt, macros, wrapping each fetch_* call in timeout.rs) — but the actual implementation uses PostgreSQL statement_timeout via after_connect. The new approach is cleaner, but the description should be updated to match.

try_new_with_timeout is defined but never called in production. The read pool in orderbook/src/run.rs:187 still uses Postgres::try_new, not try_new_with_timeout. The statement_timeout config is wired through but never applied. Only the test exercises it.

Described changes not in the diff:

  • crates/database/src/timeout.rs module
  • fetch_*_with_timeout variants on read queries
  • autopilot/src/run.rs and refunder/src/lib.rs changes

Undescribed changes in the diff:

  • trades.rs: changed from streaming (BoxStream + fetch) to batch (fetch_all + Vec) — behavioral change worth calling out with the reasoning
  • Removed [order-quoting] section from playground/configs/autopilot.toml — seems unrelated to database timeouts

Comment thread crates/orderbook/src/run.rs
Comment thread crates/orderbook/src/database/mod.rs Outdated
Comment thread crates/orderbook/src/database/mod.rs Outdated
Copy link
Copy Markdown
Contributor

@MartinquaXD MartinquaXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! That's a lot simpler and less error prone than the original implementation of this idea.
Please do update the PR description to reflect the new approach though as @squadgazzz already suggested.

Comment thread crates/orderbook/src/database/mod.rs Outdated
Co-authored-by: Martin Magnus <martin.beckmann@protonmail.com>
@jmg-duarte jmg-duarte requested a review from squadgazzz April 8, 2026 09:41
Copy link
Copy Markdown
Contributor

@squadgazzz squadgazzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@jmg-duarte jmg-duarte added this pull request to the merge queue Apr 8, 2026
Merged via the queue into main with commit d6a912d Apr 8, 2026
20 checks passed
@jmg-duarte jmg-duarte deleted the jmgd/db-query-timeout branch April 8, 2026 10:02
@github-actions github-actions bot locked and limited conversation to collaborators Apr 8, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants