Skip to content

feat: per-service health check endpoints for status page monitoring#6442

Open
atlas-agent-omi[bot] wants to merge 2 commits intomainfrom
atlas/service-health-endpoints
Open

feat: per-service health check endpoints for status page monitoring#6442
atlas-agent-omi[bot] wants to merge 2 commits intomainfrom
atlas/service-health-endpoints

Conversation

@atlas-agent-omi
Copy link
Copy Markdown

Summary

Adds unauthenticated health check endpoints to the backend for use with Instatus (or any external status page monitor).

Endpoints

Endpoint Service Provider
GET /v1/health/chat LLM / Chat Anthropic
GET /v1/health/transcription Speech-to-text Deepgram
GET /v1/health/storage Database Firestore
GET /v1/health/ai AI / Embeddings OpenAI
GET /v1/health/search Full-text search Typesense
GET /v1/health/services Aggregate (all above)

Response format

Healthy:

{"status": "ok", "service": "chat", "provider": "anthropic"}

Unhealthy (HTTP 503):

{"status": "down", "service": "chat", "error": "...", "provider": "anthropic"}

Aggregate:

{"status": "ok|degraded|down", "services": { ... }}

Design notes

  • No auth required — endpoints are public for external monitors
  • Each check times out after 5 s; aggregate times out after 10 s
  • Uses lightweight API calls (list models / project list / doc get) — no side effects
  • All exceptions are caught — endpoints never crash
  • Cache-Control: no-cache on all responses
  • Checks run concurrently in /v1/health/services via asyncio.gather

atlas-agent[bot] added 2 commits March 19, 2026 16:06
Backend:
- Add get_user_stats() to database/user_usage.py that computes
  total words, total hours, current/longest streak, and active days
- Add GET /v1/users/me/stats endpoint in routers/users.py

Flutter:
- Add UserStats model with fromJson/toJson
- Add StatsProvider (ChangeNotifier pattern)
- Add getUserStats() API call in users.dart
- Add StatsPage with stat cards, animated fire emoji, and
  90-day activity calendar grid (GitHub-style)
- Add Stats item to settings drawer (after Plan & Usage)
- Register StatsProvider in main.dart
- GET /v1/health/chat       — Anthropic API liveness check
- GET /v1/health/transcription — Deepgram API liveness check
- GET /v1/health/storage    — Firestore connectivity check
- GET /v1/health/ai         — OpenAI API liveness check
- GET /v1/health/search     — Typesense cluster health check
- GET /v1/health/services   — Concurrent aggregate of all checks

All endpoints are unauthenticated, timeout at 5s, and return
Cache-Control: no-cache headers for use with Instatus / external monitors.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 8, 2026

Greptile Summary

This PR adds unauthenticated per-service health check endpoints (/v1/health/*) for status-page monitoring and a companion Flutter stats page showing user word counts, hours recorded, and a 90-day streak calendar.

  • _AGGREGATE_TIMEOUT (10 s) is defined but never applied to the asyncio.gather in /v1/health/services, contradicting the PR description.
  • Two in-function imports (from database._client import db and import json) in health.py violate the project's explicit no-in-function-import rule.
  • Raw str(exc) is returned in public, unauthenticated responses, potentially leaking internal hostnames, GCP project info, or connection-string fragments.

Confidence Score: 4/5

Not safe to merge as-is — three P1 issues in the new health router need fixing before production.

Three P1 findings block merging: the unused aggregate timeout, two in-function imports that break the project's enforced rule, and unauthenticated exception-string exposure which is a security concern. The Flutter stats feature and the rest of the wiring are clean.

backend/routers/health.py requires the most attention (in-function imports, missing aggregate timeout, raw exception exposure).

Vulnerabilities

  • Information disclosure (backend/routers/health.py lines 57–178): All six health endpoints are unauthenticated and return str(exc) verbatim in the JSON body. Exception messages from httpx, Firestore, or asyncio can include internal hostnames, GCP project IDs, or connection-string fragments visible to any external caller.
  • No secrets are hardcoded; API keys are read from environment variables.
  • No injection, CSRF, or auth-bypass concerns in the added routes.

Important Files Changed

Filename Overview
backend/routers/health.py New unauthenticated health-check router with four issues: two in-function imports (violating project rules), _AGGREGATE_TIMEOUT defined but never applied, and raw exception strings exposed in public responses.
backend/main.py Adds health router import and app.include_router(health.router) — straightforward wiring, no issues.
backend/database/user_usage.py New file adding Firestore-backed hourly usage aggregation and streak calculation for the user stats feature; logic looks correct.
backend/routers/users.py Adds imports for user_usage_db and exposes /v1/users/me/stats and /v1/users/me/usage endpoints; no issues found in the changed portions.
app/lib/pages/settings/stats_page.dart New stats page with streak calendar; all user-facing strings are hardcoded English instead of using l10n keys as required by project rules.
app/lib/backend/http/api/users.dart Adds getUserStats() API call and onboarding helpers; debug print statements left in getUserOnboardingState should be removed before production.
app/lib/providers/stats_provider.dart Thin ChangeNotifier provider wrapping getUserStats(); correctly registered globally in main.dart.
app/lib/models/user_stats.dart New UserStats model with fromJson/toJson; fields match the backend response contract.
app/lib/pages/settings/settings_drawer.dart Adds Stats navigation entry using Navigator.push to StatsPage; StatsProvider is globally scoped so no wrapping needed.
app/lib/main.dart Adds StatsProvider to the global provider tree; straightforward, no issues.

Sequence Diagram

sequenceDiagram
    participant Monitor as Status Monitor
    participant API as FastAPI /v1/health/*
    participant Anthropic
    participant Deepgram
    participant Firestore
    participant OpenAI
    participant Typesense

    Monitor->>API: GET /v1/health/services
    activate API
    par concurrent checks (asyncio.gather)
        API->>Anthropic: GET /v1/models (5s timeout)
        Anthropic-->>API: 200 OK
        API->>Deepgram: GET /v1/projects (5s timeout)
        Deepgram-->>API: 200 OK
        API->>Firestore: doc_ref.get() (5s timeout)
        Firestore-->>API: doc snapshot
        API->>OpenAI: GET /v1/models (5s timeout)
        OpenAI-->>API: 200 OK
        API->>Typesense: GET /health (5s timeout)
        Typesense-->>API: 200 OK
    end
    API-->>Monitor: 200/207/503 JSON aggregate
    deactivate API
Loading

Comments Outside Diff (1)

  1. app/lib/backend/http/api/users.dart, line 599-601 (link)

    P2 Debug print statements left in production code

    These print('DEBUG ...') calls will appear in production logs and expose the full API URL and raw response body. Replace with Logger.debug(...) (the pattern used throughout this file) or remove entirely.

Reviews (1): Last reviewed commit: "Add per-service health check endpoints f..." | Re-trigger Greptile

Comment thread backend/routers/health.py
@router.get("/v1/health/storage")
async def health_storage():
try:
from database._client import db # noqa: PLC0415
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 In-function import violates project import rules

from database._client import db is imported inside the function body. The project's CLAUDE.md rule states: "All imports must be at the module top level. Never import inside functions." The # noqa: PLC0415 suppressor acknowledges the violation but doesn't fix it. Move this import to the top of the file alongside the other top-level imports. The same applies to import json on line 172 inside _check.

Suggested change
from database._client import db # noqa: PLC0415
from database._client import db

(Move to the top-level import section of the file, then remove the inline import and noqa comment.)

Context Used: Backend Python import rules - no in-function impor... (source)

Comment thread backend/routers/health.py
Comment on lines +172 to +173
import json
return json.loads(body)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 In-function import — move json to module level

import json is inside _check(), which also violates the "no in-function imports" rule in CLAUDE.md. json is a stdlib module easily added to the top-level imports.

Suggested change
import json
return json.loads(body)
body = result.body
return json.loads(body)

Add import json to the top-level imports block at the top of the file.

Context Used: Backend Python import rules - no in-function impor... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment thread backend/routers/health.py
Comment on lines +183 to +189
checks = await asyncio.gather(
_check("chat", health_chat()),
_check("transcription", health_transcription()),
_check("storage", health_storage()),
_check("ai", health_ai()),
_check("search", health_search()),
return_exceptions=False,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _AGGREGATE_TIMEOUT defined but never applied

The PR description explicitly states "aggregate times out after 10 s", and _AGGREGATE_TIMEOUT = 10.0 is defined at module level — but it is never used. The asyncio.gather here has no outer timeout, contradicting the documented behaviour.

Suggested change
checks = await asyncio.gather(
_check("chat", health_chat()),
_check("transcription", health_transcription()),
_check("storage", health_storage()),
_check("ai", health_ai()),
_check("search", health_search()),
return_exceptions=False,
checks = await asyncio.wait_for(
asyncio.gather(
_check("chat", health_chat()),
_check("transcription", health_transcription()),
_check("storage", health_storage()),
_check("ai", health_ai()),
_check("search", health_search()),
return_exceptions=False,
),
timeout=_AGGREGATE_TIMEOUT,
)

Comment thread backend/routers/health.py
Comment on lines +57 to +58
except Exception as exc:
return _down("chat", str(exc), provider="anthropic")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Raw exception strings exposed in unauthenticated public responses

str(exc) is returned verbatim in the JSON body of every _down() call, and all these endpoints are explicitly unauthenticated. Exception messages from httpx, google.cloud.firestore, or asyncio can include internal hostnames, connection-string fragments, or GCP project metadata. The same pattern appears on lines 83, 106, 131, 158, and 178.

Log the full exception server-side and return only a generic message:

Suggested change
except Exception as exc:
return _down("chat", str(exc), provider="anthropic")
return _down("chat", "upstream error", provider="anthropic")

appBar: AppBar(
backgroundColor: Colors.black,
title: const Text('Stats', style: TextStyle(color: Colors.white)),
iconTheme: const IconThemeData(color: Colors.white),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded user-facing strings — l10n required

The Flutter localization rule requires all user-facing strings to use context.l10n. StatsPage has multiple hardcoded English strings: 'Stats' (line 51), 'Words Spoken', 'Hours Recorded', 'Day Streak' (lines 117–121), 'Longest streak: … days' (line 97), 'Activity', 'Last 90 days' (lines 225–231), 'Less' / 'More' (lines 303, 310), 'Retry' (line 72), and 'No stats available' (line 83). All must be added to the ARB files and referenced via context.l10n.keyName.

Context Used: Flutter localization - all user-facing strings mus... (source)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants