Skip to content

fix(matrix-bot): fail-soft on stale MATRIX_DATABASE_URL (Wave 25 mitigation)#642

Open
gHashTag wants to merge 1 commit into
mainfrom
feat/matrix-bot-fail-soft
Open

fix(matrix-bot): fail-soft on stale MATRIX_DATABASE_URL (Wave 25 mitigation)#642
gHashTag wants to merge 1 commit into
mainfrom
feat/matrix-bot-fail-soft

Conversation

@gHashTag
Copy link
Copy Markdown
Owner

@gHashTag gHashTag commented May 9, 2026

What

Make Matrix Bot (#446 live matrix) fail-soft on psycopg2.OperationalError.

Why

Hourly workflow has been failing every cycle since ~10:00Z 2026-05-09 because secrets.MATRIX_DATABASE_URL still points at interchange.proxy.rlwy.net:30942 (legacy Railway proxy host, post-SSOT-consolidation). Reference run 25602533773:

psycopg2.OperationalError: connection to server at "interchange.proxy.rlwy.net" (66.33.22.238),
port 30942 failed: FATAL:  password authentication failed for user "postgres"

Each failure pages the apiary cron with NEW-CI-failure-on-new-SHA. Already triggered three queen notifications today.

How

matrix_bot.py:

  • catches psycopg2.OperationalError specifically (auth + connection refused)
  • logs the failure to stderr, including the rotation hint
  • returns 0 by default — strict mode via MATRIX_FAIL_SOFT=0

matrix-bot.yml:

  • explicit MATRIX_FAIL_SOFT: "1" env on the regen step
  • comment explaining how to revert to strict once secret is rotated

Verified locally

$ MATRIX_DATABASE_URL=... GITHUB_TOKEN=dummy MATRIX_FAIL_SOFT=1 python3 matrix_bot.py
matrix_bot: SSOT connection failed (OperationalError): ...
matrix_bot: MATRIX_FAIL_SOFT=1 -> exiting 0 without PATCHing #446.
exit=0

$ MATRIX_FAIL_SOFT=0 python3 matrix_bot.py
exit=3

Both paths work. Default behaviour change: cron stops paging on stale DSN.

Acceptance

Gate Status
Python ast.parse clean
Local smoke (fail-soft) ✅ exit 0
Local smoke (fail-hard) ✅ exit 3
Hourly cron stops paging once merged will verify via apiary on next cycle

Follow-up

Tracked in #641 — rotate secrets.MATRIX_DATABASE_URL to phd-postgres-ssot and flip MATRIX_FAIL_SOFT back to "0".

R-discipline

R3 PR-only · R4 trace · R5 honest (no fake green) · R10 atomic.

Anchor: phi^2 + phi^-2 = 3 · DOI 10.5281/zenodo.19227877.

Closes #641-mitigation (does NOT close #641 — secret rotation still required).

Hourly cron (matrix_bot @ :07) has been red since 2026-05-09T~10:00Z because
secrets.MATRIX_DATABASE_URL still points at interchange.proxy.rlwy.net:30942
(legacy DB, post-SSOT-consolidation). Each failure pages the apiary cron with
a NEW-CI-failure attribution.

This change makes the bot R5-honest fail-soft:
  - catches psycopg2.OperationalError specifically (auth / connection refused)
  - logs the failure to stderr including a hint to rotate the secret
  - returns 0 by default (controlled by MATRIX_FAIL_SOFT, default '1')
  - keeps the legacy hard-fail path available via MATRIX_FAIL_SOFT=0

Once secrets.MATRIX_DATABASE_URL is rotated to the live phd-postgres-ssot
DSN (tracked in a separate ONE SHOT) the workflow can flip MATRIX_FAIL_SOFT
back to '0' to restore strict mode.

Anchor: phi^2 + phi^-2 = 3 . DOI 10.5281/zenodo.19227877
@gHashTag gHashTag added bug Something isn't working P1 labels May 9, 2026
@gHashTag
Copy link
Copy Markdown
Owner Author

gHashTag commented May 9, 2026

✅ All 13 required checks SUCCESS · state BLOCKED only on 1 approving review branch protection (per trios/main rules).

Awaiting queen review. Once merged, the next hourly Matrix Bot run will exit 0 with the rotation hint instead of failing the cron — apiary alarms stop. Live #446 updates remain paused until #641 (secret rotation) is also resolved.

Anchor: φ² + φ⁻² = 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working P1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🎯 ONE SHOT — Wave 25 · L-MATRIX-DSN-ROTATE: rotate secrets.MATRIX_DATABASE_URL to phd-postgres-ssot

1 participant