From fca0ee9659504b826398f4be2be3dcde055fad8d Mon Sep 17 00:00:00 2001 From: Leo Chen Date: Wed, 27 May 2026 08:10:21 -0700 Subject: [PATCH 1/2] feat: introduce stability policy and documentation updates --- CHANGELOG.md | 4 + CONTRIBUTING.md | 1 + README.md | 4 + STABILITY.md | 116 ++++++++++++++++++++++++++++ cppa_pinecone_sync/sync_api.py | 2 + docs/Core_public_API.md | 2 + docs/Onboarding.md | 1 + docs/README.md | 1 + github_activity_tracker/sync_api.py | 2 + 9 files changed, 133 insertions(+) create mode 100644 STABILITY.md diff --git a/CHANGELOG.md b/CHANGELOG.md index a8a85ad..100f92d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added + +- **Stability policy** ([`STABILITY.md`](STABILITY.md)): documents stable vs evolving vs unstable interfaces for production and contributors; README links to it. + ### Changed - **core.collectors:** Removed deprecated `CollectorBase` and `DjangoCommandCollector`; the supported collector contract is **`AbstractCollector`** + **`BaseCollectorCommand`** (see docs). diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0de1bb9..29c9d82 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -139,6 +139,7 @@ uv run python benchmarks/compare_to_baseline.py bench.json benchmarks/baselines. - **Code style:** Use Python 3.13 and follow Django and project conventions. Use the project’s logging (`logging.getLogger(__name__)`). Before pushing, run **`uv run pyright`** (with dev deps) for the paths covered by **`pyrightconfig.json`**, and ensure CI’s **lint** / **pyright** / **test** jobs would pass. - **Database:** Use the Django ORM and migrations. Writes only through the service layer as above. - **Docs:** Update this file (and app `services.py` docstrings) when adding new apps or changing the write rules. After changing `services.py` or `core/protocols.py`, run `python scripts/generate_service_docs.py` and commit the updated `docs/service_api/` files. +- **Stability:** Pull requests that change `sync_api.__all__`, the `/health/` JSON contract, or management command names used in `config/boost_collector_schedule.yaml` must update [STABILITY.md](STABILITY.md) and [CHANGELOG.md](CHANGELOG.md) when the change is user-visible. ## Related documentation diff --git a/README.md b/README.md index 2c4ec56..84ceba9 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,10 @@ Boost Data Collector is a Django project that collects and manages data from var **Responsible disclosure:** do not open a public GitHub Issue for undisclosed security problems. Read **[`SECURITY.md`](SECURITY.md)** for supported versions, in-scope components, how to report privately (**GitHub Security**; email only when an address is published there), response timelines, and **credential rotation** guidance (GitHub tokens, Slack, Discord, Pinecone, YouTube, browser session material, Django `SECRET_KEY`, database URLs). +## Stability + +Releases follow [Semantic Versioning](https://semver.org/). While the project is on **`0.x`**, tagged releases on **`main`** (e.g. `v0.1.0`) still define a practical stability contract for production: scheduled management commands, `/health/`, documented environment variable names, schedule YAML shape, and cross-app `sync_api` / `core` imports. Everything else may change without notice. See **[`STABILITY.md`](STABILITY.md)** for stable vs evolving vs unstable interfaces and release bump rules. + ## Critical environment variables Authoritative names, examples, and comments live in **[`.env.example`](.env.example)**. Typical values you must set for a working local or deployed stack: diff --git a/STABILITY.md b/STABILITY.md new file mode 100644 index 0000000..9001322 --- /dev/null +++ b/STABILITY.md @@ -0,0 +1,116 @@ +# Stability policy + +This document defines **which interfaces we treat as stable** for production deployments and in-repo contributors. It complements [Semantic Versioning](https://semver.org/) described in [CHANGELOG.md](CHANGELOG.md). + +Boost Data Collector is a **deployed Django application**, not a published PyPI library. Stability commitments apply to **operations** (commands, health checks, configuration) and **documented cross-app Python surfaces**, not to arbitrary imports from tracker apps. + +## Audience + +| Audience | What this doc covers | +| --- | --- | +| **Operators** | Docker Compose, Celery Beat, migrations, `GET /health/`, environment variables, schedule YAML | +| **Contributors** | `core` public API, `*_sync_api` modules, import-linter boundaries | +| **Out of scope** | Third-party code importing undocumented tracker modules | + +## Versioning and branches + +- **Version numbers** follow [SemVer](https://semver.org/spec/v2.0.0.html). Release tags (e.g. `v0.1.0`) on **`main`** define production-aligned versions. See [CHANGELOG.md](CHANGELOG.md) for the release checklist and [pyproject.toml](pyproject.toml) (`[tool.setuptools_scm]`) for how `core.__version__` is derived. +- **`0.x` releases:** SemVer treats `0.y.z` as initial development. This policy adds a **practical production contract** for Tier A interfaces below so tagged releases on **`main`** (e.g. `v0.1.0`) are predictable for operators even before `1.0.0`. +- **Branches:** Deploy production from **`main`** at a **git tag**. **`develop`** is the integration branch (GitHub default for pull requests); it may change operational behavior until changes are promoted to **`main`**. See [README.md](README.md#branching-strategy) and [SECURITY.md](SECURITY.md#supported-versions) for security-fix flow (`develop` → `main`). +- **We do not backport** stability or feature changes to older tags unless maintainers explicitly agree. + +### Release bump rules + +| Bump | When | Tier A (stable) interfaces | +| --- | --- | --- | +| **PATCH** (`0.1.x`) | Bug fixes | No intentional breaking changes | +| **MINOR** (`0.2.0`) | Backward-compatible additions | New optional health fields, new schedule tasks, new `sync_api` exports allowed | +| **MAJOR** (`1.0.0`) | Breaking changes to Tier A | Requires CHANGELOG entry and migration notes | + +## Interface tiers + +### Tier A — Stable + +Breaking changes require a **major** release (or an explicit deprecation period documented in CHANGELOG). Pull requests that change these surfaces must update this file and [CHANGELOG.md](CHANGELOG.md) when behavior or names change. + +| Interface | Stability commitment | Reference | +| --- | --- | --- | +| **Management commands** in the production schedule | Command names referenced by `config/boost_collector_schedule.yaml` are stable; renaming or removing a scheduled command is breaking | [config/boost_collector_schedule.yaml.example](config/boost_collector_schedule.yaml.example), [docs/Workflow.md](docs/Workflow.md) | +| **Health endpoint** | `GET /health/` response shape and HTTP semantics (see below) | [config/health.py](config/health.py) | +| **Environment variables** | Names documented in [`.env.example`](.env.example); renames need a deprecation note for at least one release | `.env.example` | +| **Schedule YAML shape** | Top-level `groups`, per-group `default_time` and `tasks`, task `command` and `schedule` keys consumed by `boost_collector_runner` | [docs/Workflow.md](docs/Workflow.md) | +| **`core` public Python API** | Imports listed in Core public API doc | [docs/Core_public_API.md](docs/Core_public_API.md) | +| **Cross-app `sync_api` modules** | Only symbols in each module’s `__all__` | [github_activity_tracker/sync_api.py](github_activity_tracker/sync_api.py), [cppa_pinecone_sync/sync_api.py](cppa_pinecone_sync/sync_api.py); enforced by [`.importlinter`](.importlinter) | + +#### Scheduled management commands (example schedule) + +These command names appear in [config/boost_collector_schedule.yaml.example](config/boost_collector_schedule.yaml.example). Production `config/boost_collector_schedule.yaml` may add tasks but should not rename scheduled commands without a major release note: + +- `run_boost_usage_tracker` +- `run_update_created_repos_by_language` +- `run_boost_github_activity_tracker` +- `run_boost_library_usage_dashboard` +- `run_clang_github_tracker` +- `collect_boost_libraries` +- `run_wg21_paper_tracker` +- `run_cppa_slack_tracker` +- `run_discord_activity_tracker` +- `run_boost_mailing_list_tracker` + +Other `manage.py` commands exist for manual runs, backfills, and development; only commands **listed in your deployed schedule YAML** are Tier A for that deployment. + +#### Health endpoint contract + +- **URL:** `GET /health/` +- **Success:** HTTP **200** when database and Celery worker checks pass and collector freshness rules pass (see `HEALTH_ENFORCE_COLLECTOR_FRESHNESS` in settings). +- **Failure:** HTTP **503** when critical checks fail. +- **Auth (optional):** If `HEALTH_CHECK_TOKEN` is set, requests must send `Authorization: Bearer `; otherwise HTTP **401** with `{"status": "unauthorized", "detail": "..."}`. +- **Stable JSON keys:** top-level `status` (`healthy` \| `unhealthy`); `checks` with `database`, `celery_workers`, `collector_groups`, `collector_meta`, `pinecone_sync`. New optional keys may be added in minor releases; removing or renaming existing keys is breaking. + +#### Cross-app `sync_api` exports + +**`github_activity_tracker.sync_api`** — `build_issue_document`, `build_pr_document`, `fetcher`, path helpers (`get_*_json_path`, `get_raw_source_*_path`, `iter_existing_*_jsons`), `normalize_issue_json`, `normalize_pr_json`, `save_*_raw_source`. + +**`cppa_pinecone_sync.sync_api`** — `PineconeInstance`, `PreprocessFn`, `sync_to_pinecone`. + +Other tracker apps must not import `fetcher`, `sync`, `ingestion`, `services`, `workspace`, or `preprocessors` directly where [`.importlinter`](.importlinter) forbids it. + +### Tier B — Evolving + +Supported in production with **forward migrations** and **CHANGELOG** notes. Not treated as import-stable across minor releases. + +| Interface | Policy | +| --- | --- | +| **PostgreSQL schema** | Changed only via Django migrations; every deploy runs `python manage.py migrate` | +| **`services.py` functions** | Per-app write API; signatures may change in minor `0.x` releases when [docs/service_api/](docs/service_api/) and all callers are updated together. Cross-app reads should use **`services`** or **`sync_api`**, not foreign models (see [CONTRIBUTING.md](CONTRIBUTING.md)) | + +### Tier C — Unstable + +No compatibility promise. May change in any release without deprecation. + +- Direct `Model.objects` queries or ORM access outside an app’s `services.py` (except intentional identity-hub FKs documented in [docs/cross-app-dependencies.md](docs/cross-app-dependencies.md)). +- Imports of tracker internals bypassing `sync_api` (e.g. `github_activity_tracker.fetcher`, `cppa_pinecone_sync.sync` from apps covered by import-linter). +- Workspace directory layouts under `WORKSPACE_DIR`, except paths explicitly documented in [`.env.example`](.env.example) and [docs/Workspace.md](docs/Workspace.md). +- `slack_event_handler` internals, management commands not in your schedule, scripts under `scripts/`, tests, and Django admin customization. + +## Deprecation + +- Prefer **additive** changes in minor releases. +- **Python:** emit `DeprecationWarning` and document in CHANGELOG (example: `CollectorBase` → `AbstractCollector` in [docs/Core_public_API.md](docs/Core_public_API.md)). +- **Configuration:** document env var renames in CHANGELOG and keep the old name commented in `.env.example` for at least one release cycle when feasible. +- **Breaking removals** of Tier A interfaces target **`1.0.0`**, except urgent security mitigations (see [SECURITY.md](SECURITY.md)). + +## Production deployments + +1. Build and deploy from **`main`** at a **git tag** (e.g. `v0.1.0`); pin the image or git SHA in production. +2. Run migrations after deploy (`manage.py migrate --noinput`). +3. Verify readiness: `curl -fsS http:///health/` (see [docs/GCP_Production_Checklist.md](docs/GCP_Production_Checklist.md)). +4. Optional: log `core.__version__` for support correlation. +5. Do not assume arbitrary commits on **`develop`** meet Tier A guarantees until they are released on **`main`**. + +## Related documentation + +- [docs/Core_public_API.md](docs/Core_public_API.md) — stable `core` imports +- [CONTRIBUTING.md](CONTRIBUTING.md) — service layer and contributor rules +- [CHANGELOG.md](CHANGELOG.md) — release notes and semver +- [SECURITY.md](SECURITY.md) — supported versions and vulnerability reporting diff --git a/cppa_pinecone_sync/sync_api.py b/cppa_pinecone_sync/sync_api.py index 4169c1b..50d718c 100644 --- a/cppa_pinecone_sync/sync_api.py +++ b/cppa_pinecone_sync/sync_api.py @@ -3,6 +3,8 @@ Other tracker apps must import orchestration from this module only — not ``sync``, ``ingestion``, or ``services`` directly. + +Stability: only symbols in ``__all__`` are Tier A; see STABILITY.md at the repo root. """ from __future__ import annotations diff --git a/docs/Core_public_API.md b/docs/Core_public_API.md index 8fc505d..3aeb4f0 100644 --- a/docs/Core_public_API.md +++ b/docs/Core_public_API.md @@ -1,5 +1,7 @@ # Core package: stable public surfaces +Stability guarantees for these imports are defined in [STABILITY.md](../STABILITY.md) (Tier A). + The `core` Django app holds shared infrastructure. Treat the following as the **supported internal API** for collectors and cross-app helpers. Other modules under `core/` may change without notice; prefer importing from the paths below. ## Collectors diff --git a/docs/Onboarding.md b/docs/Onboarding.md index b3f131e..b6f3f36 100644 --- a/docs/Onboarding.md +++ b/docs/Onboarding.md @@ -102,4 +102,5 @@ Historically, collectors evolved separately: some use plain **`BaseCommand`**, w |-------|-----| | Add / register a collector | [How_to_add_a_collector.md](How_to_add_a_collector.md) | | Stable `core` imports | [Core_public_API.md](Core_public_API.md) | +| Stability policy | [STABILITY.md](../STABILITY.md) | | Full doc index | [README.md](README.md) (this folder) | diff --git a/docs/README.md b/docs/README.md index 93e413d..dddc631 100644 --- a/docs/README.md +++ b/docs/README.md @@ -8,6 +8,7 @@ Documentation is organized **by topic**, not by app. Each doc covers one cross-c |-------|-----|---------| | **Onboarding** | [Onboarding.md](Onboarding.md) | First-day orientation: mental model, app roles, data dependencies, where patterns differ. | | **Architecture overview** | [Architecture_overview.md](Architecture_overview.md) | **Start here for system design:** all 15 domain apps + `core`, persistence, coupling, links to app READMEs and service API. | +| **Stability** | [STABILITY.md](../STABILITY.md) | Stable vs evolving vs unstable interfaces; semver and production release expectations. | | **Workflow** | [Workflow.md](Workflow.md) | Main application workflow, execution order, and project details. | | **Architecture (data flow)** | [Architecture_data_flow.md](Architecture_data_flow.md) | Data flow (sources → collectors → DB / workspace → Pinecone), orchestration diagram, per-app component map. | | **Tutorial: building a collector** | [Tutorial_building_a_collector.md](Tutorial_building_a_collector.md) | End-to-end walkthrough: `startcollector`, hooks, tests, YAML/Celery, deploy. | diff --git a/github_activity_tracker/sync_api.py b/github_activity_tracker/sync_api.py index e772b84..2b6538d 100644 --- a/github_activity_tracker/sync_api.py +++ b/github_activity_tracker/sync_api.py @@ -4,6 +4,8 @@ Other tracker apps (e.g. clang_github_tracker) must import orchestration helpers from this module only — not ``fetcher``, ``sync.*``, ``workspace``, or ``preprocessors`` directly. + +Stability: only symbols in ``__all__`` are Tier A; see STABILITY.md at the repo root. """ from __future__ import annotations From 7ff5d3884aba46236a1fb153c5cd52b499cc126a Mon Sep 17 00:00:00 2001 From: Leo Chen Date: Fri, 29 May 2026 07:38:03 -0700 Subject: [PATCH 2/2] docs: update README and STABILITY.md for clarity on branch roles and stability policies --- README.md | 6 +-- STABILITY.md | 91 ++++++++++++++++++++++++++++++---- docs/cross-app-dependencies.md | 2 + 3 files changed, 85 insertions(+), 14 deletions(-) diff --git a/README.md b/README.md index 7ff2132..c873ba0 100644 --- a/README.md +++ b/README.md @@ -325,7 +325,7 @@ See **[docs/Deployment.md](docs/Deployment.md)** for: **GitHub’s configured default branch for this repository is `develop`.** -- **main** – Default/production branch (stable, release-ready code). -- **develop** – Development branch (active integration and feature work). +- **develop** – GitHub default branch; integration and pull-request target; deploys to **staging** (see [docs/Deployment.md](docs/Deployment.md)). +- **main** – **Production** branch; release tags and Tier A stability contract ([`STABILITY.md`](STABILITY.md)); deploys to **production**. - Feature branches: Create from `develop`. Do not branch from `main` for day-to-day work. -- Pull requests: Open PRs against `develop`; merge to `main` for releases. +- Pull requests: Open PRs against `develop`; promote to `main` for releases. diff --git a/STABILITY.md b/STABILITY.md index 9001322..719da41 100644 --- a/STABILITY.md +++ b/STABILITY.md @@ -16,7 +16,9 @@ Boost Data Collector is a **deployed Django application**, not a published PyPI - **Version numbers** follow [SemVer](https://semver.org/spec/v2.0.0.html). Release tags (e.g. `v0.1.0`) on **`main`** define production-aligned versions. See [CHANGELOG.md](CHANGELOG.md) for the release checklist and [pyproject.toml](pyproject.toml) (`[tool.setuptools_scm]`) for how `core.__version__` is derived. - **`0.x` releases:** SemVer treats `0.y.z` as initial development. This policy adds a **practical production contract** for Tier A interfaces below so tagged releases on **`main`** (e.g. `v0.1.0`) are predictable for operators even before `1.0.0`. -- **Branches:** Deploy production from **`main`** at a **git tag**. **`develop`** is the integration branch (GitHub default for pull requests); it may change operational behavior until changes are promoted to **`main`**. See [README.md](README.md#branching-strategy) and [SECURITY.md](SECURITY.md#supported-versions) for security-fix flow (`develop` → `main`). +- **GitHub default branch** is **`develop`** (where pull requests merge). **Production stability** is defined by **git tags on `main`**, not by every commit on `develop`. +- **Branches:** Deploy production from **`main`** at a **git tag**. **`develop`** is the integration branch; it may change operational behavior until changes are promoted to **`main`**. See [README.md](README.md#branching-strategy) and [SECURITY.md](SECURITY.md#supported-versions) for security-fix flow (`develop` → `main`). +- **Staging:** Pushes to **`develop`** deploy to staging per [docs/Deployment.md](docs/Deployment.md). Staging may run commits not yet on a production tag. **Tier A guarantees apply to tagged `main` releases**; do not assume every `develop` HEAD meets them until promoted and released. - **We do not backport** stability or feature changes to older tags unless maintainers explicitly agree. ### Release bump rules @@ -36,15 +38,20 @@ Breaking changes require a **major** release (or an explicit deprecation period | Interface | Stability commitment | Reference | | --- | --- | --- | | **Management commands** in the production schedule | Command names referenced by `config/boost_collector_schedule.yaml` are stable; renaming or removing a scheduled command is breaking | [config/boost_collector_schedule.yaml.example](config/boost_collector_schedule.yaml.example), [docs/Workflow.md](docs/Workflow.md) | -| **Health endpoint** | `GET /health/` response shape and HTTP semantics (see below) | [config/health.py](config/health.py) | -| **Environment variables** | Names documented in [`.env.example`](.env.example); renames need a deprecation note for at least one release | `.env.example` | -| **Schedule YAML shape** | Top-level `groups`, per-group `default_time` and `tasks`, task `command` and `schedule` keys consumed by `boost_collector_runner` | [docs/Workflow.md](docs/Workflow.md) | -| **`core` public Python API** | Imports listed in Core public API doc | [docs/Core_public_API.md](docs/Core_public_API.md) | +| **Health endpoint** | `GET /health/` response shape and HTTP semantics (see [Health endpoint contract](#health-endpoint-contract)) | [config/health.py](config/health.py) | +| **Environment variables** | Documented names in [`.env.example`](.env.example) are stable on rename (deprecation required); see [minimum operational set](#tier-a-environment-variables-minimum-operational-set) | `.env.example`, [README.md](README.md#critical-environment-variables) | +| **Schedule YAML shape** | Keys and structure in [Schedule YAML (Tier A keys)](#schedule-yaml-tier-a-keys) | [docs/Workflow.md](docs/Workflow.md) | +| **`core` public Python API** | **Entire** [docs/Core_public_API.md](docs/Core_public_API.md) — collectors, `core.errors`, and `core.protocols` | [docs/Core_public_API.md](docs/Core_public_API.md) | | **Cross-app `sync_api` modules** | Only symbols in each module’s `__all__` | [github_activity_tracker/sync_api.py](github_activity_tracker/sync_api.py), [cppa_pinecone_sync/sync_api.py](cppa_pinecone_sync/sync_api.py); enforced by [`.importlinter`](.importlinter) | #### Scheduled management commands (example schedule) -These command names appear in [config/boost_collector_schedule.yaml.example](config/boost_collector_schedule.yaml.example). Production `config/boost_collector_schedule.yaml` may add tasks but should not rename scheduled commands without a major release note: +**Orchestration (Tier A):** + +- **`run_scheduled_collectors`** — runs tasks from `config/boost_collector_schedule.yaml`. Stable CLI flags: `--schedule`, `--group`, `--strict`, `--day-of-week`, `--day-of-month`, `--interval-minutes` (behavior in [docs/Workflow.md](docs/Workflow.md)). +- **`boost_collector_runner.tasks.run_scheduled_collectors_task`** — Celery entry point; stable behavior (delegates to the management command with equivalent kwargs). Operators normally do not invoke this directly. + +These per-collector command names appear in [config/boost_collector_schedule.yaml.example](config/boost_collector_schedule.yaml.example). Production `config/boost_collector_schedule.yaml` may add tasks but should not rename scheduled commands without a major release note: - `run_boost_usage_tracker` - `run_update_created_repos_by_language` @@ -57,7 +64,34 @@ These command names appear in [config/boost_collector_schedule.yaml.example](con - `run_discord_activity_tracker` - `run_boost_mailing_list_tracker` -Other `manage.py` commands exist for manual runs, backfills, and development; only commands **listed in your deployed schedule YAML** are Tier A for that deployment. +Other `manage.py` commands exist for manual runs, backfills, and development; only commands **listed in your deployed schedule YAML** (plus **`run_scheduled_collectors`**) are Tier A for that deployment. + +#### Schedule YAML (Tier A keys) + +| Level | Stable keys | +| --- | --- | +| Top-level | `groups` | +| Per group | `default_time`, `tasks` | +| Per task | `command`, `schedule` (values: `daily`, `weekly`, `monthly`, `interval`, `on_release`) | +| Per task (optional, Tier A when present) | `enabled` (default: enabled if omitted; `enabled: false` skips the task), `args` (list), `minutes` (interval), `on` / `day_of_week` / `day_of_month` (weekly/monthly) | + +New **optional** task keys may be added in minor releases. Changing the meaning of existing keys is breaking. + +#### Tier A environment variables (minimum operational set) + +[`.env.example`](.env.example) is the authoritative list of documented variable **names**. All documented names follow the rename/deprecation policy below, but not every key is required for every environment. + +| Area | Variables | +| --- | --- | +| Core | `DATABASE_URL`, `SECRET_KEY`, `DEBUG`, `WORKSPACE_DIR` | +| Celery | `CELERY_BROKER_URL`, `CELERY_RESULT_BACKEND` | +| GitHub | `GITHUB_TOKEN`, `GITHUB_TOKENS_SCRAPING`, `GITHUB_TOKEN_WRITE` | +| Health | `HEALTH_CHECK_TOKEN`, `HEALTH_ENFORCE_COLLECTOR_FRESHNESS`, `HEALTH_CELERY_MIN_WORKERS`, `HEALTH_CELERY_INSPECT_TIMEOUT`, `HEALTH_COLLECTOR_STALE_HOURS` | +| Schedule | `BOOST_COLLECTOR_SCHEDULE_YAML`, `BOOST_COLLECTOR_SCHEDULE_STRICT` | + +Other variables in `.env.example` remain **name-stable** on rename but may be optional, integration-specific, or dev-only. + +**Rename policy:** env var renames need a deprecation note for at least one release (see [Deprecation](#deprecation)). #### Health endpoint contract @@ -65,16 +99,38 @@ Other `manage.py` commands exist for manual runs, backfills, and development; on - **Success:** HTTP **200** when database and Celery worker checks pass and collector freshness rules pass (see `HEALTH_ENFORCE_COLLECTOR_FRESHNESS` in settings). - **Failure:** HTTP **503** when critical checks fail. - **Auth (optional):** If `HEALTH_CHECK_TOKEN` is set, requests must send `Authorization: Bearer `; otherwise HTTP **401** with `{"status": "unauthorized", "detail": "..."}`. -- **Stable JSON keys:** top-level `status` (`healthy` \| `unhealthy`); `checks` with `database`, `celery_workers`, `collector_groups`, `collector_meta`, `pinecone_sync`. New optional keys may be added in minor releases; removing or renaming existing keys is breaking. +- **Top-level (Tier A):** `status` (`healthy` \| `unhealthy`); `checks` object with keys `database`, `celery_workers`, `collector_groups`, `collector_meta`, `pinecone_sync`. New optional top-level or check keys may be added in minor releases; removing or renaming listed keys is breaking. + +##### Health endpoint — nested JSON (Tier A) + +| Check key | Stable shape | +| --- | --- | +| `database` | Always `ok` (bool). On success: `latency_ms` (int). On failure: `error` (string). | +| `celery_workers` | `ok`, `workers` (list), `responded`, `expected`; on failure `error`. | +| `collector_groups` | **Dynamic map:** keys are schedule **group ids** (deployment-specific). Per entry: `last_success_at` (ISO 8601 string or null), `stale` (bool or null for groups not on a daily schedule). Key names are **not** fixed across deployments. | +| `collector_meta` | `any_stale`, `enforce_freshness`, `error` (optional), `skipped` (optional string when the database check failed). | +| `pinecone_sync` | **Dynamic map:** keys are `app_type` values from the database; per entry `final_sync_at` (ISO 8601 string or null). The whole object may be `error` or `skipped` when the check failed or was skipped. Key names are **not** fixed across deployments. | + +Implementation: [config/health.py](config/health.py). #### Cross-app `sync_api` exports -**`github_activity_tracker.sync_api`** — `build_issue_document`, `build_pr_document`, `fetcher`, path helpers (`get_*_json_path`, `get_raw_source_*_path`, `iter_existing_*_jsons`), `normalize_issue_json`, `normalize_pr_json`, `save_*_raw_source`. +**`github_activity_tracker.sync_api`** — `build_issue_document`, `build_pr_document`, `fetcher`, `get_commit_json_path`, `get_issue_json_path`, `get_pr_json_path`, `get_raw_source_issue_path`, `get_raw_source_pr_path`, `iter_existing_commit_jsons`, `iter_existing_issue_jsons`, `iter_existing_pr_jsons`, `normalize_issue_json`, `normalize_pr_json`, `save_commit_raw_source`, `save_issue_raw_source`, `save_pr_raw_source`. **`cppa_pinecone_sync.sync_api`** — `PineconeInstance`, `PreprocessFn`, `sync_to_pinecone`. Other tracker apps must not import `fetcher`, `sync`, `ingestion`, `services`, `workspace`, or `preprocessors` directly where [`.importlinter`](.importlinter) forbids it. +#### Cross-app surfaces (summary) + +| Surface | Tier | Rule | +| --- | --- | --- | +| `*_sync_api` | **A** | Import only symbols in `__all__` | +| `{app}.services` | **B** | Allowed cross-app reads/writes per [CONTRIBUTING.md](CONTRIBUTING.md); signatures may change in `0.x` minors | +| Tracker internals (`fetcher`, `sync`, ORM outside `services`) | **C** | Forbidden where [`.importlinter`](.importlinter) says so | + +See [docs/cross-app-dependencies.md](docs/cross-app-dependencies.md) for the full coupling map. + ### Tier B — Evolving Supported in production with **forward migrations** and **CHANGELOG** notes. Not treated as import-stable across minor releases. @@ -90,16 +146,24 @@ No compatibility promise. May change in any release without deprecation. - Direct `Model.objects` queries or ORM access outside an app’s `services.py` (except intentional identity-hub FKs documented in [docs/cross-app-dependencies.md](docs/cross-app-dependencies.md)). - Imports of tracker internals bypassing `sync_api` (e.g. `github_activity_tracker.fetcher`, `cppa_pinecone_sync.sync` from apps covered by import-linter). -- Workspace directory layouts under `WORKSPACE_DIR`, except paths explicitly documented in [`.env.example`](.env.example) and [docs/Workspace.md](docs/Workspace.md). +- Workspace directory layouts under `WORKSPACE_DIR`, except paths explicitly documented in [`.env.example`](.env.example) and [docs/Workspace.md](docs/Workspace.md). **Per-app JSON schemas** under `workspace/` are not stable. +- Docker Compose service names (`web`, `celery_worker`, `celery_beat`) and host ports are not Tier A unless documented here in a future release. - `slack_event_handler` internals, management commands not in your schedule, scripts under `scripts/`, tests, and Django admin customization. ## Deprecation - Prefer **additive** changes in minor releases. -- **Python:** emit `DeprecationWarning` and document in CHANGELOG (example: `CollectorBase` → `AbstractCollector` in [docs/Core_public_API.md](docs/Core_public_API.md)). +- **Python:** emit `DeprecationWarning`, document in CHANGELOG, and keep the old symbol for at least one release cycle when feasible. - **Configuration:** document env var renames in CHANGELOG and keep the old name commented in `.env.example` for at least one release cycle when feasible. - **Breaking removals** of Tier A interfaces target **`1.0.0`**, except urgent security mitigations (see [SECURITY.md](SECURITY.md)). +## How this policy is enforced + +This policy is not honor-system only: + +- **import-linter** — `lint-imports` (pre-commit and CI) enforces import contracts in [`.importlinter`](.importlinter), implementing Tier C boundaries between tracker apps. +- **`scripts/check_service_layer_writes.py`** — pre-commit and CI; flags ORM writes outside the owning app’s `services.py` (see [CONTRIBUTING.md](CONTRIBUTING.md)). + ## Production deployments 1. Build and deploy from **`main`** at a **git tag** (e.g. `v0.1.0`); pin the image or git SHA in production. @@ -111,6 +175,11 @@ No compatibility promise. May change in any release without deprecation. ## Related documentation - [docs/Core_public_API.md](docs/Core_public_API.md) — stable `core` imports +- [docs/Workflow.md](docs/Workflow.md) — schedule types and `run_scheduled_collectors` +- [docs/cross-app-dependencies.md](docs/cross-app-dependencies.md) — import/FK boundaries +- [docs/Deployment.md](docs/Deployment.md) — staging vs production deploys +- [docs/GCP_Production_Checklist.md](docs/GCP_Production_Checklist.md) — production readiness +- [`.env.example`](.env.example) — authoritative env names (see [minimum operational set](#tier-a-environment-variables-minimum-operational-set)) - [CONTRIBUTING.md](CONTRIBUTING.md) — service layer and contributor rules - [CHANGELOG.md](CHANGELOG.md) — release notes and semver - [SECURITY.md](SECURITY.md) — supported versions and vulnerability reporting diff --git a/docs/cross-app-dependencies.md b/docs/cross-app-dependencies.md index ff6a2ff..8423051 100644 --- a/docs/cross-app-dependencies.md +++ b/docs/cross-app-dependencies.md @@ -6,6 +6,8 @@ ForeignKey from one tracker app into another's models" — visible and therefore For **typed data boundaries** (run results, activity rows, checkpoints) shared across apps, prefer :mod:`core.protocols` (see [Core_public_API.md](Core_public_API.md#tracker-protocols-dtos)). +Stability tiers for imports and operational contracts: [STABILITY.md](../STABILITY.md). + **Re-generate the import tables** after large refactors: ```bash