Skip to content

fix: experimental SQL bugs + CI coverage#288

Draft
NikolayS wants to merge 3 commits into
mainfrom
claude/fix-experimental-sql-oqxpbr
Draft

fix: experimental SQL bugs + CI coverage#288
NikolayS wants to merge 3 commits into
mainfrom
claude/fix-experimental-sql-oqxpbr

Conversation

@NikolayS

Copy link
Copy Markdown
Owner

Bugs

A1 (medium) — pgque.throughput() division by zero for sub-minute buckets.
sql/experimental/observability.sql aligned buckets with % (extract(epoch from i_bucket_size)::int / 60). Integer division makes the divisor 0 for any bucket size under 60 seconds, so select * from pgque.throughput('q', '1 hour', '30 seconds') aborted with division by zero whenever ticks existed in the window. Sizes between 60 s and 120 s that are not minute multiples (e.g. '90 seconds'% 1) silently collapsed to minute boundaries.

A2 (low) — pgque.create_queue(text, jsonb) accepted negative max_retries.
sql/experimental/config_api.sql special-cased 'max_retries' with a direct update pgque.queue, bypassing the >= 0 validation in pgque.set_queue_config(). With '{"max_retries": -1}', the coalesce(ev_retry,0) >= v_max_retries check in pgque.nack() is always true, so every event dead-letters on first nack. History check: the special case landed in the same commit (8a54a7e) that introduced the validating set_queue_config() override, which already accepts max_retries — the special case was redundant, never a workaround.

D2 (low) — experimental SQL had zero CI coverage.
tests/run_experimental.sql was referenced by no workflow, which is exactly how A1/A2 shipped.

Fixes

  • A1: compute buckets in epoch seconds — to_timestamp(floor(extract(epoch from tick_time) / extract(epoch from i_bucket_size)) * ...) — so any positive interval works; reject i_bucket_size <= interval '0' (or null) with a clear exception.
  • A2: route every option through pgque.set_queue_config(), dropping the special case, so negative values are rejected with the canonical error (max_retries must be >= 0).
  • D2: new Run experimental tests step in the test job of .github/workflows/ci.yml (full PG 14–18 matrix): installs sql/pgque.sql plus the three sql/experimental/*.sql files into a separate pgque_experimental database (so the later idempotency/uninstall steps keep testing the default install) and runs tests/run_experimental.sql.

Red/green TDD: tests added first and confirmed failing, then the fixes. New tests:

  • tests/test_observability.sql: throughput with '30 seconds' bucket (was: division by zero), exact 90-second bucketing with synthetic ticks (2 buckets, 100/150 events), non-positive bucket size raises.
  • tests/test_experimental_config_api.sql: create_queue('q', '{"max_retries": -1}') raises the canonical error and the queue is not created.

Verification

Fresh scratch DB, PostgreSQL 16:

createdb pgque_exp
psql -d pgque_exp -v ON_ERROR_STOP=1 -f sql/pgque.sql
psql -d pgque_exp -v ON_ERROR_STOP=1 \
  -f sql/experimental/delayed.sql \
  -f sql/experimental/observability.sql \
  -f sql/experimental/config_api.sql
psql -d pgque_exp -v ON_ERROR_STOP=1 -f tests/run_experimental.sql

Result (tail):

NOTICE:  PASS: throughput() handles 30-second buckets (1 rows)
NOTICE:  PASS: throughput() buckets 90-second intervals from epoch
NOTICE:  PASS: throughput() rejects non-positive bucket size
NOTICE:  PASS: experimental config API
NOTICE:  PASS: create_queue rejects negative max_retries
=== ALL EXPERIMENTAL TESTS PASSED ===

Pre-fix run of the same suite failed with ERROR: division by zero (test 9) and ERROR: create_queue should reject negative max_retries (config test), confirming red before green.

Main suite unaffected — fresh DB:

createdb pgque_all
psql -d pgque_all -v ON_ERROR_STOP=1 -f sql/pgque.sql
psql -d pgque_all -v ON_ERROR_STOP=1 -f tests/run_all.sql

Result: === ALL TESTS PASSED === (exit 0).

sql/experimental/*.sql is not embedded in the generated sql/pgque.sql (verified: no throughput/jsonb create_queue in it), so no rebuild via build/transform.sh was needed.

Addresses findings A1, A2, D2 of #283

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv


Generated by Claude Code

claude added 3 commits June 10, 2026 13:33
pgque.throughput() aligned buckets with an integer division by 60,
so any bucket size under 60 seconds raised division-by-zero, and
60-120 s sizes that are not minute multiples (e.g. 90 seconds)
silently collapsed to minute boundaries.

Compute buckets in epoch seconds instead, so any positive interval
works, and reject non-positive bucket sizes with a clear error.
Tests added first (red), then the fix (green).

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv
pgque.create_queue(text, jsonb) wrote 'max_retries' straight into
pgque.queue, bypassing the >= 0 validation in
pgque.set_queue_config(). A negative value made every nack
dead-letter on first attempt. Route all options through
set_queue_config(), which already accepts max_retries.
Test added first (red), then the fix (green).

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv
sql/experimental/*.sql and tests/run_experimental.sql were not
exercised by any workflow, which is how the throughput and
create_queue bugs shipped. Install the experimental files into a
separate database and run the suite across the full PG matrix.

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants