Skip to content

fix: harden uninstall paths (stop jobs, narrow guards)#290

Draft
NikolayS wants to merge 4 commits into
mainfrom
claude/fix-uninstall-scripts-oqxpbr
Draft

fix: harden uninstall paths (stop jobs, narrow guards)#290
NikolayS wants to merge 4 commits into
mainfrom
claude/fix-uninstall-scripts-oqxpbr

Conversation

@NikolayS

Copy link
Copy Markdown
Owner

Bugs

Four uninstall-path findings from #283:

  • C4 (medium): sql/pgque-tle-uninstall.sql ran drop extension if exists pgque cascade without calling pgque.stop() first. pg_cron / pg_timetable jobs are catalog rows, not dependent objects, so pgque_ticker (every 1 s), pgque_retry_events, pgque_maint, and pgque_rotate_step2 survived the drop and failed every 1–30 s forever ("schema pgque does not exist"), spamming logs and cron.job_run_details.
  • C6 (low): sql/pgque_uninstall.sql wrapped pgque.stop() in exception when others then null, silently swallowing real failures (e.g. cron.unschedule permission errors, the "untrusted pg_timetable schema owner" exception) and dropping the schema anyway — same orphaned-jobs outcome, but silent.
  • C7 (low): the generated sql/pgque-tle.sql header claimed the file "works in psql, GUI tools (DBeaver, etc.), JDBC, libpq-direct callers", but the script uses \set ON_ERROR_STOP and \echo psql meta-commands, so any non-psql client fails on the first one.
  • C9 (low): pgque.uninstall() ran drop schema pgque cascade, which fails on extension (pg_tle) installs with a confusing dependency error ("extension pgque requires it").

Fixes

  • C4: sql/pgque-tle-uninstall.sql now calls pgque.stop() before drop extension, with the drop in the same do block, so a real stop() failure aborts the uninstall before anything is dropped.
  • C6: both uninstall scripts narrow the handler to undefined_function / invalid_schema_name — empirically the sqlstates raised when pgque is not installed (42883 / 3F000) — keeping the scripts idempotent without swallowing real stop() failures. sql/pgque_uninstall.sql also performs the drop inside the same do block, so abort-before-drop holds regardless of client ON_ERROR_STOP settings.
  • C7: the header (generated by build/transform.sh) now states the file is a psql script and points non-psql callers at pgtle.install_extension() with the sql/pgque.sql body. ON_ERROR_STOP behavior is kept.
  • C9: pgque.uninstall() detects pg_extension membership first and raises a clear exception pointing to drop extension pgque cascade / sql/pgque-tle-uninstall.sql.

sql/pgque.sql and sql/pgque-tle.sql regenerated via bash build/transform.sh in the commits that touch sources.

Tests (red/green TDD)

  • New tests/test_uninstall_guard.sql (added to tests/run_all.sql): swaps pgque.stop() for instrumented fakes (real definition saved and restored), asserting (1) a raising stop() aborts sql/pgque_uninstall.sql before the schema drop, (2) sql/pgque-tle-uninstall.sql calls pgque.stop(), (3) a raising stop() aborts the TLE script too. Test (1) failed (schema dropped) and test (2) failed (stop never called) at origin/main; all pass after the fix. Runs without pg_cron / pg_tle.
  • New assertion in tests/test_tle_install.sql (pg_tle CI job): pgque.uninstall() on an extension install must raise an error pointing to drop extension pgque cascade, leaving schema and extension intact.

Verification (local PG 16, fresh scratch DBs)

# sqlstate probe for the narrowed handler (basis for C6 fix)
perform pgque.stop()  -- no pgque schema     -> 3F000 invalid_schema_name
perform pgque.stop()  -- schema, no function -> 42883 undefined_function

# red phase at origin/main scripts
psql -d pgque_uninst_red -f tests/test_uninstall_guard.sql
-> ERROR: pgque schema must survive when pgque.stop() raises during uninstall (exit 3)

# green phase after fix
psql -d pgque_uninst_red -v ON_ERROR_STOP=1 -f tests/test_uninstall_guard.sql
-> 4x PASS, exit 0

# C4/C6 idempotency matrix
psql -d <plain-install-db> -v ON_ERROR_STOP=1 -f sql/pgque-tle-uninstall.sql  -> exit 0, schema untouched
psql -d <empty-db> -v ON_ERROR_STOP=1 -f sql/pgque-tle-uninstall.sql          -> exit 0 (run twice, idempotent)
psql -d <plain-install-db> -v ON_ERROR_STOP=1 -f sql/pgque_uninstall.sql      -> exit 0, schema gone
psql -d <same-db> -v ON_ERROR_STOP=1 -f sql/pgque_uninstall.sql               -> exit 0 (idempotent rerun)

# C9: plain uninstall() still works
psql -d <plain-install-db> -c 'select pgque.uninstall();'  -> schema dropped, notice printed

# full suite on a fresh DB at HEAD
createdb pgque_uninst_full
psql -d pgque_uninst_full -v ON_ERROR_STOP=1 -f sql/pgque.sql
psql -d pgque_uninst_full -v ON_ERROR_STOP=1 -f tests/run_all.sql
-> === ALL TESTS PASSED === (157 PASS notices, exit 0)

# generated-file drift check
bash build/transform.sh && git diff --exit-code -- sql/pgque.sql sql/pgque-tle.sql  -> clean

pg_tle / pg_cron are not installed locally; the TLE-specific C9 assertion runs in the dedicated pg_tle CI job, and the new guard tests degrade cleanly without either extension.

Addresses findings C4, C6, C7, C9 of #283

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv


Generated by Claude Code

claude added 4 commits June 10, 2026 13:37
pg_cron / pg_timetable jobs are catalog rows, not dependent objects:
dropping the pgque schema or extension leaves pgque_ticker,
pgque_retry_events, pgque_maint and pgque_rotate_step2 behind, failing
every 1-30 s forever. sql/pgque-tle-uninstall.sql now calls
pgque.stop() before drop extension, and both uninstall scripts perform
the drop in the same do block so a real stop() failure aborts the
uninstall. The previous catch-all handler in sql/pgque_uninstall.sql is
narrowed to undefined_function / invalid_schema_name (the sqlstates
raised when pgque is not installed), keeping the scripts idempotent
without swallowing real errors such as cron.unschedule permission
failures.

Addresses findings C4 and C6 of #283.

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv
When pgque is installed as an extension (pg_tle path), pgque.uninstall()
ran drop schema pgque cascade, which fails with a confusing dependency
error: extension pgque requires it. Detect pg_extension membership first
and raise a clear exception pointing to drop extension pgque cascade and
the pg_tle uninstall script.

Regenerates sql/pgque.sql and sql/pgque-tle.sql; covered by a new
assertion in tests/test_tle_install.sql (runs in the pg_tle CI job).

Addresses finding C9 of #283.

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv
The generated header claimed the file works in GUI tools, JDBC and
libpq-direct callers, but the script uses \set ON_ERROR_STOP and \echo
psql meta-commands, so any non-psql client fails on the first one.
State that it is a psql script and point non-psql callers at
pgtle.install_extension() with the sql/pgque.sql body instead.

Regenerates sql/pgque-tle.sql.

Addresses finding C7 of #283.

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv
In the pg_tle CI job, pgque is installed as an extension. Running
sql/pgque-tle-uninstall.sql from test_uninstall_guard.sql there
(correctly, after the C4/C9 fixes) drops the extension -- and the
whole pgque schema with it -- mid-suite, so the "plain install must
survive" assertion failed. Gate the sub-tests that execute the TLE
uninstall script behind a psql \if on pgque not being an extension
member, emitting the suite's usual SKIP notice; the extension path
of the script is covered by tests/test_tle_install.sql.

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv

Copy link
Copy Markdown
Owner Author

CI follow-up: "pg_tle install path" job fixed in 313a87c.

Cause: in the pg_tle job, pgque is installed as an extension. tests/test_uninstall_guard.sql executed sql/pgque-tle-uninstall.sql mid-suite, and after the C4/C9 fixes the script now (correctly) stops pgque, drops the extension -- taking the whole pgque schema with it -- and unregisters from pg_tle. The test's "plain (non-extension) install must survive" assertion then failed because the install was never plain. The assumption only holds in the plain-install jobs.

Fix: gate the two sub-tests that execute sql/pgque-tle-uninstall.sql behind a psql-level \if on pg_catalog.pg_extension not containing pgque (psql-level because the script is run via \i, which a DO-block skip cannot prevent). When pgque is an extension, the suite emits the usual SKIP: notice and the script is never executed; the pgque.stop() save/restore around the suite is unaffected. Test 1 (sql/pgque_uninstall.sql aborts when stop() raises) is safe under both install modes and stays ungated. The extension path of the TLE uninstall script remains covered by tests/test_tle_install.sql.

Verification (local, plain install, PG 16):

  • fresh scratch DB + psql -v ON_ERROR_STOP=1 -f sql/pgque.sql, then full tests/run_all.sql: === ALL TESTS PASSED === with all three guard sub-tests exercised (PASS notices present, no skips).
  • skip branch simulated (pg_tle is not installable locally) by forcing the gate variable to true in a temp copy: SKIP: notice emitted, no line of pgque-tle-uninstall.sql executed, schema intact, stop() restored, exit 0.

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants