Start here for system design. This page is the single entry point for all Django apps in Boost Data Collector: what each package does, where data lives, how apps depend on each other, and where to read next.
Last verified: 2026-05-26 (against develop).
For diagrams and ingest flow, see Architecture_data_flow.md. For FK/import detail, see cross-app-dependencies.md. For first-day setup, see Onboarding.md.
- One Django project, one PostgreSQL database (
boost_dashboard); all project apps shareconfig/settings.py. - Collectors are
management/commands(e.g.run_boost_mailing_list_tracker). Production batches useboost_collector_runner→run_scheduled_collectorsreadingconfig/boost_collector_schedule.yaml(see Workflow.md). - Writes to an app’s models go only through that app’s
services.py(CONTRIBUTING.md). - Cross-app imports are constrained by import-linter (see cross-app-dependencies.md §5).
The ticket term “15 Django apps” means all domain/orchestration packages below except core (shared library). This doc covers 16 rows: 15 domain apps + core.
| Package | Role | Models / services | Deep dive |
|---|---|---|---|
core |
Collector contracts (AbstractCollector, BaseCollectorCommand), structured errors, core.operations (GitHub, Slack, files, markdown) |
No ORM; not a data domain | core/README.md, Core_public_API.md |
boost_collector_runner |
Resolves YAML schedule; runs run_scheduled_collectors |
services.py for group run status only |
boost_collector_runner/README.md, Workflow.md |
Columns: persistence (usual durable stores), coupling (one-line upstream → downstream), docs (app README, service API, schema).
| App | Role | Models / services.py |
Persistence | Coupling (summary) | Docs |
|---|---|---|---|---|---|
core |
Shared collector + operations library | No models | N/A | Used by all collectors | README, Core_public_API |
boost_collector_runner |
YAML / Celery orchestration | Run status only | N/A | Invokes all run_* commands |
README, service_api |
cppa_user_tracker |
Identity hub (GitHub, Slack, Discord, WG21, mailing list, YouTube profiles) | Yes | PostgreSQL | Upstream: none (hub). Downstream: all person-attributed trackers | README, service_api, Schema § Overview |
github_activity_tracker |
GitHub repos, commits, issues, PRs; Language/License reference | Yes | PostgreSQL, workspace | Upstream: cppa_user_tracker. Downstream: Boost, usage, clang pipelines |
README, service_api |
boost_library_tracker |
Boost catalog, versions, dependencies, maintainer roles | Yes | PostgreSQL, workspace | Upstream: github_activity_tracker, cppa_user_tracker. Downstream: docs, usage |
README, service_api |
boost_library_docs_tracker |
Boost documentation crawl and doc rows | Yes | PostgreSQL, workspace | Upstream: boost_library_tracker. Downstream: cppa_pinecone_sync |
README, service_api |
boost_library_usage_dashboard |
Dashboard / aggregation (shim — no local domain models) | Reads peers; no generated service_api | PostgreSQL, workspace exports | Upstream: boost_usage_tracker, others. Downstream: reporting only |
README — no service_api page |
boost_usage_tracker |
External repos using Boost headers | Yes | PostgreSQL, workspace | Upstream: github_activity_tracker, boost_library_tracker. Downstream: dashboard |
README, service_api |
boost_mailing_list_tracker |
Mailing list archives | Yes | PostgreSQL, workspace | Upstream: cppa_user_tracker. Downstream: optional Pinecone |
README, service_api |
cppa_pinecone_sync |
Vector upserts, fail lists, sync status | Yes | PostgreSQL, Pinecone | Upstream: doc/GitHub/mailing collectors. Downstream: Pinecone index | README, service_api, Pinecone_preprocess_guideline |
clang_github_tracker |
LLVM/Clang GitHub activity | Yes | PostgreSQL, workspace | Upstream: github_activity_tracker (via sync_api), cppa_user_tracker |
README, service_api |
cppa_slack_tracker |
Slack teams, channels, messages | Yes | PostgreSQL, workspace | Upstream: cppa_user_tracker |
README, service_api |
discord_activity_tracker |
Discord servers, channels, messages | Yes | PostgreSQL, workspace | Upstream: cppa_user_tracker |
README, service_api |
wg21_paper_tracker |
WG21 papers and authors | Yes | PostgreSQL, workspace | Upstream: cppa_user_tracker |
README, service_api |
cppa_youtube_script_tracker |
YouTube metadata and transcripts | Yes | PostgreSQL, workspace | Upstream: cppa_user_tracker |
README, service_api |
slack_event_handler |
Slack Socket Mode listener (PR bot / huddles) — long-running, not YAML batch | No ORM / no services.py |
Workspace JSON, GitHub optional | Upstream: Slack events. Downstream: GitHub MD via operations | README — no service_api page |
Primary scheduled commands (YAML / Celery batch via config/boost_collector_schedule.yaml; non-exhaustive — see Workflow.md):
| App | Typical run_* command |
|---|---|
boost_collector_runner |
run_scheduled_collectors |
cppa_user_tracker |
run_cppa_user_tracker |
github_activity_tracker |
(via boost_library_tracker) run_boost_github_activity_tracker |
boost_library_tracker |
run_boost_github_activity_tracker, collect_boost_libraries, … |
boost_library_docs_tracker |
run_boost_library_docs_tracker |
boost_library_usage_dashboard |
run_boost_library_usage_dashboard |
boost_usage_tracker |
run_boost_usage_tracker, run_update_created_repos_by_language, … |
boost_mailing_list_tracker |
run_boost_mailing_list_tracker |
cppa_pinecone_sync |
run_cppa_pinecone_sync |
clang_github_tracker |
run_clang_github_tracker |
cppa_slack_tracker |
run_cppa_slack_tracker |
discord_activity_tracker |
run_discord_activity_tracker |
wg21_paper_tracker |
run_wg21_paper_tracker |
cppa_youtube_script_tracker |
run_cppa_youtube_script_tracker |
Long-running entrypoint services (not in the YAML schedule; run as a persistent process, e.g. Compose / runserver integration):
| App | Entry command | Notes |
|---|---|---|
slack_event_handler |
run_slack_event_handler |
Slack Socket Mode listener (PR bot / huddles); see Docker.md §4b |
cppa_user_tracker owns Identity and profile tables. Trackers that attribute activity to people hold FKs into this app (intentional hub — see cross-app-dependencies.md §1).
github_activity_tracker— raw GitHub mirror and reference data.boost_library_tracker— catalog tied to GitHub repos/files (MTI/OneToOne into github models).boost_library_docs_tracker— documentation rows keyed to library versions.boost_usage_tracker/boost_library_usage_dashboard— usage and reporting downstream.
Collectors persist to PostgreSQL and/or workspace → cppa_pinecone_sync (and some in-command sync phases) upsert embeddings. Namespace and field rules: Pinecone_preprocess_guideline.md.
- Service layer: All creates/updates/deletes for an app’s models go through that app’s
services.py. Index: Service_API.md, service_api/README.md. - Schema vs behavior: Table diagrams in Schema.md; import/FK matrix in cross-app-dependencies.md.
- Import linting: Run
lint-importslocally; config in.importlinter. Regenerate import tables:python scripts/list_cross_app_imports.py.
- New collector app:
python manage.py startcollector <app_label>— see CONTRIBUTING.md § Creating a new collector and How_to_add_a_collector.md. - Schedule: Add tasks to
config/boost_collector_schedule.yaml(Workflow.md). - New cross-app coupling: Update cross-app-dependencies.md and ensure import-linter contracts still pass.
High-level data movement (sources → collectors → DB / workspace → Pinecone) and orchestration:
- Architecture_data_flow.md §1–2 — Mermaid flowcharts and per-app persistence table.
| Topic | Doc |
|---|---|
| Onboarding / mental model | Onboarding.md |
| PR reviews / CODEOWNERS | CODEOWNERS_and_branch_protection.md |
| 1:1 walkthrough runbooks | onboarding/ |
| Bus-factor ticket checklist | BUS_FACTOR_DELIVERABLES.md |