GCP Public Health & Clinic Trend Tracker

A scalable, serverless data pipeline designed to correlate internal clinic symptom logs with external public health trends (CDC FluView). This platform empowers clinical operations to predict patient surges and optimize resource allocation.

Architecture

The system follows a modern ELT (Extract, Load, Transform) architecture on Google Cloud Platform:

Ingestion: Airflow (Cloud Composer) orchestrates data fetching from the CDC FluView API (via CMU Delphi) and internal clinic logs.
Data Lake: Raw JSON/CSV files are stored in Google Cloud Storage (GCS) for auditability.
Data Warehouse: BigQuery stores raw data and executes transformations.
Transformation: dbt (Data Build Tool) cleans, models, and aggregates data into "Marts" for reporting.
Visualization: Metabase (on Cloud Run) provides interactive dashboards for trend analysis.

Getting Started

Prerequisites

Ensure you have the following installed:

Python 3.9+
Google Cloud SDK (gcloud)
Terraform
Docker (optional, for local testing)

Installation

Clone the Repository

git clone https://github.com/forceliuss/disease-trend-pipeline.git
cd disease-trend-pipeline

Set Up Virtual Environment

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Configure Environment Copy the example credentials (or use your own Service Account key from GCP Console):

# Place your service account key file
mkdir -p secrets
cp /path/to/your/key.json secrets/gcp-sa-key.json

Create a .env file:

echo "GOOGLE_APPLICATION_CREDENTIALS=./secrets/gcp-sa-key.json" >> .env
echo "GCP_PROJECT_ID=health-project-486811" >> .env

Infrastructure Deployment (Terraform)

Provision the GCP resources (GCS buckets, BigQuery datasets, Cloud Composer/Airflow):

Initialize Terraform
```
cd terraform
terraform init
```
Plan and Apply
```
terraform plan -out=tfplan
terraform apply tfplan
```
Note: This will create billable resources on GCP. Ensure you have the necessary permissions and billing enabled on your project.

Usage

Running the Pipeline (Airflow)

Once Cloud Composer is deployed, access the Airflow UI via the URL provided in the Terraform output.

cdc_ingestion_weekly: Triggers every Wednesday to fetch new CDC data.
clinic_logs_daily: Runs daily at 2:00 UTC to ingest internal logs.

Running Transformations (dbt)

To manually run transformations locally:

dbt debug  # Test connection
dbt deps   # Install dependencies
dbt run    # Run all models
dbt test   # Run data validation tests

Project Structure

├── dags/                 # Airflow DAGs (Python)
├── dbt_project/          # dbt models, seeds, and tests
├── docs/                 # Documentation (Architecture, Guides, Tickets)
├── scripts/              # Helper scripts (backfills, data generation)
├── terraform/            # Infrastructure as Code (GCP resources)
└── requirements.txt      # Python dependencies

Developed by Forceliuss

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.astro		.astro
dags		dags
scripts		scripts
terraform		terraform
tests/dags		tests/dags
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
airflow_settings.yaml		airflow_settings.yaml
packages.txt		packages.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GCP Public Health & Clinic Trend Tracker

Architecture

Getting Started

Prerequisites

Installation

Infrastructure Deployment (Terraform)

Usage

Running the Pipeline (Airflow)

Running Transformations (dbt)

Project Structure

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GCP Public Health & Clinic Trend Tracker

Architecture

Getting Started

Prerequisites

Installation

Infrastructure Deployment (Terraform)

Usage

Running the Pipeline (Airflow)

Running Transformations (dbt)

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages