Skip to content

Add /healthz (liveness) and /readyz (readiness) HTTP probes#6

Merged
Aidaho12 merged 1 commit into
roxy-wi:mainfrom
megative:feat/health-probes
Jun 12, 2026
Merged

Add /healthz (liveness) and /readyz (readiness) HTTP probes#6
Aidaho12 merged 1 commit into
roxy-wi:mainfrom
megative:feat/health-probes

Conversation

@megative

Copy link
Copy Markdown
Contributor

Summary

  • Adds two unauthenticated probes: /healthz (liveness, always 200, no DB) and /readyz (readiness, runs SELECT 1 and checks pending migrations, 200 or 503).
  • Both endpoints live outside /api/, so the existing auth middleware bypasses them with no change.
  • The only existing-code touch is a 2-line guard in before_request so /healthz keeps answering when the database is down.

This unblocks a Kubernetes Helm chart and any HTTP load balancer (HAProxy / nginx / ELB) that today can only fall back to a TCP socket check.

Test plan

  • tests/test_health.py — 10 new cases covering happy path, DB outage (mocked), pending migrations (mocked), migration-check exceptions, no-auth requirement, and a regression guard that both paths stay outside /api/*. All pass locally.
  • Full local pytest: 405 passed. The 4 unrelated failures on this branch are in tests/test_incidents_api.py and tests/test_maintenance_windows.py — they already exist on main and the touched files are not modified here.
  • Manual curl against a running server, healthy and with the DB pointed at an unreadable path:
Scenario /healthz /readyz
Healthy 200 {"status":"ok"} 200 {"status":"ready","database":"ok",…}
DB unreachable 200 {"status":"ok"} 503 {"status":"not_ready","database":"error","database_error":"unable to open database file"}
POST /api/integrations/alertmanager (sanity) n/a 401/api/* auth is untouched
  • CI Tests workflow on this PR (Python 3.10 / 3.11 / 3.12).

This is an AI-assisted PR.

Adds two unauthenticated HTTP endpoints intended for orchestrators and
load balancers:

- /healthz (liveness): always returns 200 with {"status": "ok"} as
  long as the process can serve a request. Deliberately does NOT touch
  the database — a database outage must not cause Kubernetes to
  restart the pod, because a restart would not help.

- /readyz (readiness): runs SELECT 1 against the configured database
  and verifies that all on-disk migrations have been applied. Returns
  200 with {"status": "ready", ...} on success, or 503 with a
  structured payload when the database is unreachable or pending
  migrations exist, so the load balancer drops the pod from rotation
  until the process is fully usable.

Both endpoints live outside the /api/ namespace, so the existing
api_auth_required_for_path() helper bypasses authentication without
any middleware change.

The before_request hook is taught to skip its implicit db.connect()
for the two probe paths: /healthz must remain answerable when the DB
is down, and /readyz manages its own connection explicitly so it can
return a clean 503 on DB errors.

Tests (tests/test_health.py, 10 cases) cover:
- /healthz returns 200 even when init_database is patched to raise;
- /healthz needs no auth;
- /readyz returns 200 on the happy path (DB ok, no pending migrations);
- /readyz returns 503 when SELECT 1 raises (DB outage);
- /readyz returns 503 when a fake pending migration appears on disk;
- /readyz returns 503 when the migration check itself raises;
- both probes are outside /api/* (regression guard).

All 10 new tests pass locally.
@Aidaho12 Aidaho12 merged commit 5011de5 into roxy-wi:main Jun 12, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants