Skip to content

fix(ci): make PG readiness timeout effective#286

Draft
NikolayS wants to merge 2 commits into
mainfrom
claude/fix-ci-readiness-oqxpbr
Draft

fix(ci): make PG readiness timeout effective#286
NikolayS wants to merge 2 commits into
mainfrom
claude/fix-ci-readiness-oqxpbr

Conversation

@NikolayS

Copy link
Copy Markdown
Owner

Bug

Four jobs in .github/workflows/ci.yml (test matrix, python-client, go-client, ts-client) waited for PostgreSQL with:

for i in $(seq 1 30); do
  docker exec <container> pg_isready -U postgres && break
  sleep 1
done || { echo "PG not ready after 30 seconds"; exit 1; }

A for loop's exit status is that of the last command it ran. When PG never becomes ready, the last command is sleep 1, which exits 0 — so the || branch is dead code: the job proceeds silently with PG down instead of failing fast with the timeout diagnostic.

Additionally, CLAUDE.md's File Organization section listed sql/pgque-unpgque.sql, which does not exist in the repo (separate commit).

Fix

  • Replace the broken wait in all four jobs with the explicit ready-flag pattern already used by the upgrade, tle, pgcron, and pg_timetable jobs in the same file, including a docker logs <container> dump on timeout. Container names, users, and pg_isready options are unchanged; only the wait logic changed.
  • CLAUDE.md: replace the nonexistent sql/pgque-unpgque.sql entry with the actual top-level files under sql/ (pgque_uninstall.sql, pgque-tle.sql, pgque-tle-uninstall.sql).

Local verification

The fixed wait logic was extracted verbatim into a script with a stubbed docker function (loop shortened to 3 iterations / 0.1 s sleeps for the test):

=== Case 1: pg_isready always fails (fixed logic) ===
stub: pg_isready -> no response
stub: pg_isready -> no response
stub: pg_isready -> no response
PG not ready after 30 seconds
stub: pg_isready -> no response        # docker logs stub
exit status: 1

=== Case 2: pg_isready succeeds immediately (fixed logic) ===
stub: pg_isready -> accepting connections
wait logic passed, PG ready
real    0m0.005s
exit status: 0

=== Old (broken) logic with always-failing stub, for comparison ===
stub: pg_isready -> no response
stub: pg_isready -> no response
stub: pg_isready -> no response
old logic fell through silently
exit status: 0

The fixed logic exits 1 with the diagnostic after the loop when PG never becomes ready, and exits 0 promptly on success; the old logic exited 0 even on total failure. actionlint is not installed in this environment, so it was not run; the workflow YAML was sanity-parsed with PyYAML (YAML parses OK).

Addresses finding D1 and the stale CLAUDE.md entry of #283

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv


Generated by Claude Code

claude added 2 commits June 10, 2026 13:30
The readiness wait in four jobs relied on the exit status of a
for-loop, which is that of its last command (sleep 1), so the
|| timeout branch was dead code and the diagnostic never fired
when PostgreSQL failed to start. Replace it with the ready-flag
pattern already used by the upgrade, tle, pgcron, and timetable
jobs, including a docker logs dump on timeout.

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv
The File Organization section listed sql/pgque-unpgque.sql,
which does not exist. Replace it with the actual top-level
files under sql/: pgque_uninstall.sql, pgque-tle.sql, and
pgque-tle-uninstall.sql.

https://claude.ai/code/session_01KAaEGkQZmey1D1xCsVGmqv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants