FirstAid LLM – Retrieval-Augmented First Aid Assistant

A professional healthcare guidance application powered by Retrieval-Augmented Generation (RAG). This system transforms authoritative first-aid literature into a searchable, citation-backed assistant that provides evidence-based medical guidance.

Live Demo: https://firstaidllm.org

Note: Domain for live demo expires in a year on 12/10/2026. You can still access it at 34.135.73.43, but the voice input feature will no longer work, since it requires HTTPS.

Video: https://youtu.be/PpBqigQhSJI?si=RU83TSwYXWRvBZwZ

Developed for: AC215 (Advanced Practical Data Science) - Harvard University, Fall 2024

Overview

FirstAid LLM is an AI-powered first-aid assistant that provides instant, evidence-based medical guidance. The system uses:

RAG (Retrieval-Augmented Generation) - Semantic search across 99+ authoritative medical sources
Google Gemini 2.0 Flash - LLM for composing natural language responses
Emergency Classifier - DistilBERT model to detect life-threatening emergencies
Voice Input - Browser-native Web Speech API for hands-free queries
ChromaDB - Vector database for fast semantic similarity search

Key Features

Feature	Description
Evidence-Based Responses	All answers cite sources from 99+ authoritative medical guidelines
Emergency Detection	AI classifier identifies emergencies and prompts users to call 911
Voice Input	Speak your question using browser-native voice recognition
Citation Tracking	Every response shows source documents with relevance scores
Privacy First	No conversation logging; queries are not stored

Prerequisites

Required Software

Software	Version	Purpose
Docker Engine	20.10+	Container runtime
Docker Compose	2.0+	Local orchestration
Python	3.12+	For local development
Git	2.0+	Version control

For Cloud Deployment (Optional)

Software	Version	Purpose
Google Cloud SDK	Latest	GCP authentication
Pulumi CLI	3.0+	Infrastructure as Code
kubectl	1.28+	Kubernetes management

Google Cloud Requirements

GCP Project with the following APIs enabled:
- Vertex AI API
- Cloud Speech-to-Text API (optional, for server-side voice)
- Kubernetes Engine API (for cloud deployment)
- Artifact Registry API (for container images)
Service Account with roles:
- Vertex AI User
- Storage Object Viewer (for training data)
- Kubernetes Engine Developer (for deployment)

Environment Setup

# Enable required GCP APIs
gcloud services enable aiplatform.googleapis.com
gcloud services enable speech.googleapis.com
gcloud services enable container.googleapis.com
gcloud services enable artifactregistry.googleapis.com

Quick Start

Option 1: Local Development with Docker Compose (Recommended)

# 1. Clone the repository
git clone https://github.com/firstaid-llm/AC215_firstaid-llm.git
cd firstaid-llm

# 2. Set up credentials
mkdir -p secrets
cp /path/to/your/service-account-key.json ./secrets/llm-service-account.json

# 3. Start all services
docker compose up --build

# 4. Access the application
# Frontend: http://localhost:5000
# Emergency Classifier API: http://localhost:8100
# ChromaDB: http://localhost:8000

Option 2: Production Deployment (GKE)

# Deploy to Google Kubernetes Engine
cd deployment/scripts
./deploy.sh

Verify Installation

# Check service health
curl http://localhost:5000/health
curl http://localhost:8100/health
curl http://localhost:8000/api/v2/heartbeat

Deployment

Local Development Stack

The docker-compose.yml starts four services:

Service	Port	Description
`frontend`	5000	Flask web application
`emergency-classifier-api`	8100	Emergency detection API
`chromadb`	8000	Vector database
`datapipeline`	-	ETL pipeline (runs once)

# Start all services
docker compose up --build

# Start specific services
docker compose up frontend chromadb emergency-classifier-api

# Run data pipeline only
docker compose up datapipeline

# View logs
docker compose logs -f frontend

Production Deployment (GKE Autopilot)

The application deploys to Google Kubernetes Engine using Pulumi for Infrastructure as Code.

One-Click Deployment

cd deployment/scripts
./deploy.sh

This script will:

Check prerequisites (Pulumi, gcloud, Docker)
Create Artifact Registry repository
Build and push Docker images
Deploy GKE Autopilot cluster
Deploy Kubernetes resources (Deployments, Services, HPAs)
Output the public frontend URL

Manual Deployment

cd deployment/pulumi

# Create virtual environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Initialize and deploy
pulumi stack select dev
pulumi up --yes

# Get frontend URL
pulumi stack output frontend_url

Deployment Architecture

Component	Type	Replicas	Access
Frontend (Flask)	Deployment + HPA	2-6	LoadBalancer (Public)
Emergency Classifier	Deployment + HPA	1-4	ClusterIP (Internal)
ChromaDB	StatefulSet	1	ClusterIP (Internal)

Access the Cluster

# Export kubeconfig
cd deployment/pulumi
pulumi stack output kubeconfig --show-secrets > kubeconfig.yaml
export KUBECONFIG=$(pwd)/kubeconfig.yaml

# View resources
kubectl get pods -n firstaid
kubectl get svc -n firstaid
kubectl logs -n firstaid -l app=frontend

Teardown

cd deployment/scripts
./destroy.sh

For detailed deployment documentation, see deployment/pulumi/README.md.

Usage

Web Interface

Navigate to the application: http://localhost:5000 (local) or your deployed URL
Home Page: Enter your question in the search box or click an example query
Assistant Page: Chat with the AI assistant; view sources and citations
Sources Page: Browse the 99+ authoritative medical sources
About Page: Learn about the technology and team

Voice Input

The application supports browser-native voice input:

Click the microphone button on the home page or assistant page
Speak your question clearly
The transcript appears in the input field automatically

Supported browsers: Chrome, Edge, Safari (latest versions)

API Endpoints

Chat API

curl -X POST http://localhost:5000/api/chat \
  -H 'Content-Type: application/json' \
  -d '{"message": "How do I treat a minor burn?"}'

Response:

{
  "answer": "For a minor burn, you should...",
  "sources": [
    {"index": 1, "title": "ABA Burn First Aid", "relevance": "89.2%"}
  ],
  "retrieval_time": 0.234,
  "total_time": 1.456,
  "emergency": {"is_emergency": false, "confidence": 0.12}
}

Emergency Classifier API

curl -X POST http://localhost:8100/api/emergency \
  -H 'Content-Type: application/json' \
  -d '{"text": "I am having severe chest pain"}'

Response:

{
  "is_emergency": true,
  "label": "emergency",
  "confidence": 0.94
}

CLI Usage

RAG Query (Answer with Sources)

python src/datapipeline/postload_rag.py answer \
  -q "How to treat severe bleeding?" \
  --collection firstaid_guidelines \
  --k 8

Vector Search Only

python src/datapipeline/postload_rag.py search \
  -q "heat stroke first aid" \
  --n 10

Python API

import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "./secrets/llm-service-account.json"
os.environ["GOOGLE_CLOUD_PROJECT"] = "firstaid-llm-479200"

from src.models.retrieve import retrieve_topk
from src.models.compose import compose_answer

# Retrieve relevant chunks
result = retrieve_topk("firstaid_guidelines", "treat severe bleeding", k=8)

# Compose answer with LLM
answer = compose_answer(
    "How to treat severe bleeding?",
    result["documents"][0][:5],
    result["metadatas"][0][:5]
)
print(answer)

Architecture

System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              User Interface                                  │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    Flask Frontend (Port 5000)                        │   │
│  │  • Home Page          • Assistant (Chat)                             │   │
│  │  • Sources Page       • About Page                                   │   │
│  │  • Voice Input (Web Speech API)                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                              │                                               │
│                              ▼                                               │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      Backend Services                                │   │
│  │                                                                       │   │
│  │  ┌───────────────────┐    ┌───────────────────┐                     │   │
│  │  │ Emergency         │    │ ChromaDB          │                     │   │
│  │  │ Classifier API    │    │ Vector Database   │                     │   │
│  │  │ (DistilBERT)      │    │ (Port 8000)       │                     │   │
│  │  │ (Port 8100)       │    │                   │                     │   │
│  │  └───────────────────┘    └───────────────────┘                     │   │
│  │                                                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                              │                                               │
│                              ▼                                               │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    Google Cloud Services                             │   │
│  │                                                                       │   │
│  │  • Vertex AI (Gemini 2.0 Flash) - Answer composition                 │   │
│  │  • Vertex AI (text-embedding-004) - Document embeddings              │   │
│  │  • Cloud Storage - Training data versioning                          │   │
│  │                                                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

Data Flow

┌──────────────────────────────────────────────────────────────────────────────┐
│ Data Pipeline                                                                 │
│                                                                               │
│ Sources (99+ PDFs/Web) → Preprocessing → Chunking → Embedding → ChromaDB     │
│                                                                               │
└──────────────────────────────────────────────────────────────────────────────┘
                                    ↓
┌──────────────────────────────────────────────────────────────────────────────┐
│ Query Processing                                                              │
│                                                                               │
│ User Query → Emergency Classifier ─┬─→ [Emergency? → 911 Modal]              │
│                                    │                                          │
│                                    └─→ Semantic Search → Gemini LLM           │
│                                                              ↓                │
│                                                    Cited Answer + Sources     │
└──────────────────────────────────────────────────────────────────────────────┘

Technology Stack

Layer	Technology
Frontend	Flask + Jinja2 + Vanilla JavaScript
Backend API	Flask (Gunicorn) + FastAPI
LLM	Google Gemini 2.0 Flash (via Vertex AI)
Embeddings	text-embedding-004 (Vertex AI)
Vector Database	ChromaDB
Emergency Classifier	DistilBERT (fine-tuned)
Voice Input	Web Speech API (browser-native)
Deployment	Docker + Kubernetes (GKE Autopilot)
IaC	Pulumi (Python)
CI/CD	GitHub Actions

Project Structure

firstaid-llm/
├── docker-compose.yml              # Local orchestration
├── Dockerfile                      # Base CI image
├── pyproject.toml                  # Root dependencies
├── Makefile                        # Build shortcuts
├── README.md                       # This file
│
├── secrets/                        # GCP credentials (gitignored)
│   └── llm-service-account.json
│
├── .github/
│   └── workflows/
│       ├── ci.yml                  # CI/CD pipeline
│       └── ml-training.yml         # ML training workflow
│
├── deployment/                     # Kubernetes deployment
│   ├── pulumi/
│   │   ├── __main__.py             # Pulumi entry point
│   │   ├── config.py               # Configuration helpers
│   │   ├── gke_cluster.py          # GKE cluster definition
│   │   ├── artifact_registry.py    # Container registry
│   │   ├── k8s_resources/          # Kubernetes manifests
│   │   │   ├── namespace.py
│   │   │   ├── chromadb.py
│   │   │   ├── frontend.py
│   │   │   ├── classifier.py
│   │   │   ├── services.py
│   │   │   └── hpa.py              # Horizontal Pod Autoscalers
│   │   └── README.md
│   └── scripts/
│       ├── deploy.sh               # One-click deploy
│       ├── destroy.sh              # Teardown
│       └── push-images.sh          # Build & push images
│
├── src/
│   ├── datapipeline/               # ETL pipeline
│   │   ├── Dockerfile
│   │   ├── run_pipeline.sh
│   │   ├── preprocess_rag.py       # Fetch & chunk documents
│   │   ├── embed_index.py          # Generate embeddings
│   │   ├── load_embeddings.py      # Load to ChromaDB
│   │   ├── postload_rag.py         # CLI query interface
│   │   └── data/data.yaml          # 99+ source definitions
│   │
│   ├── frontend/                   # Flask web application
│   │   ├── Dockerfile
│   │   ├── app_flask.py            # Main Flask app
│   │   ├── templates/              # Jinja2 templates
│   │   │   ├── base.html
│   │   │   ├── home.html
│   │   │   ├── assistant.html
│   │   │   ├── sources.html
│   │   │   └── about.html
│   │   ├── static/
│   │   │   ├── css/main.css
│   │   │   ├── js/main.js
│   │   │   └── assets/
│   │   └── pyproject.toml
│   │
│   └── models/
│       ├── Dockerfile
│       ├── retrieve.py             # ChromaDB retrieval
│       ├── compose.py              # Gemini answer composer
│       └── emergency_classifier/   # Emergency detection model
│           ├── Dockerfile          # Multi-stage (api/training)
│           ├── api.py              # FastAPI endpoint
│           ├── inference.py        # Classification logic
│           ├── train.py            # Model training
│           └── README.md
│
├── tests/
│   ├── unit/                       # Unit tests
│   ├── integration/                # Integration tests
│   ├── frontend/                   # Frontend tests
│   └── system/                     # System tests
│
└── docs/
    ├── ML_WORKFLOW.md              # ML training pipeline
    ├── TEST_COVERAGE.md            # Test coverage analysis
    └── application_design/

CI/CD Pipeline

Pipeline Overview

Push/PR → Build → Lint → Tests → [main branch only] → Deploy to GKE

CI Jobs (All Branches)

Job	Description	Time
`build`	Build Docker images, push to GHCR	~3 min
`lint-and-format`	Black formatter, Flake8 linter	~1 min
`unit-tests`	Unit tests with coverage	~2 min
`integration-tests`	Integration tests	~2 min
`frontend-tests`	Frontend component tests	~1 min
`chroma-system-tests`	ChromaDB integration	~2 min
`emergency-classifier-system-tests`	Classifier API tests	~3 min

CD Jobs (Main Branch Only)

Job	Description
`deploy-images`	Build and push to GCP Artifact Registry
`deploy-gke`	Deploy to GKE using Pulumi

Test Coverage

Module	Coverage	Status
`datapipeline/chunk_data.py`	60%	Checked
`datapipeline/clean_chunks.py`	73%	Checked
`datapipeline/utils.py`	76%	Checked
`models/compose.py`	95%	Checked
`models/emergency_classifier/dataset.py`	98%	Checked
Overall	78%	Exceeds 60% requirement

Required Secrets

Secret	Purpose
`FIRSTAID_LLM`	GCP service account JSON
`PULUMI_ACCESS_TOKEN`	Pulumi Cloud authentication
`SLACK_WEBHOOK_URL`	(Optional) Slack notifications

Known Issues and Limitations

Functional Limitations

Limitation	Description	Workaround
No Image Analysis	Cannot analyze photos of injuries	Describe the injury in text
No Real-Time Updates	Medical guidelines require manual updates	Run data pipeline periodically
Session-Based Chat	Chat history is lost on page refresh	Export chat before leaving

Technical Limitations

Issue	Description	Status
Apple Silicon Docker	Emergency classifier requires `platform: linux/amd64`	Handled in docker-compose.yml
Cold Start Latency	First query may take 5-10 seconds	Model caching after first request
Voice Input Browser Support	Web Speech API not supported in Firefox	Use Chrome, Edge, or Safari
ChromaDB Single Instance	No horizontal scaling for vector DB	Sufficient for current load

Known Issues

Bug	Description	Status
Voice button may not reset	After voice error, button may stay in "stop" state	Refresh page to reset

Emergency Classifier Thresholds

Confidence Level	Behavior
≥ 70%	Show full-screen emergency modal with 911 guidance
40-70%	Show inline warning banner
< 40%	No emergency indication

Note: The classifier may produce false positives for phrases like "I'm not having an emergency" due to keyword detection. This is mitigated by the confidence threshold.

Data Sources

Total Sources: 99+ authoritative medical guidelines
Organizations: ILCOR, IFRC, WHO, AHA, American Red Cross, CDC, NHS, and more
Last Updated: Check src/datapipeline/data/data.yaml for source list

Troubleshooting

Common Issues

Issue	Solution
Invalid JWT signature	Verify `./secrets/llm-service-account.json` matches your GCP project
403 PERMISSION_DENIED on Vertex AI	Enable Vertex AI API and grant `Vertex AI User` role to service account
403/404 during preprocessing	Some URLs block automation; check `_preprocessing_stats.json`
ChromaDB healthcheck fails	Ensure port 8000 is free:`lsof -i :8000`
Voice transcription fails	Use Chrome/Edge/Safari; Firefox doesn't support Web Speech API
PyTorch architecture error (Apple Silicon)	Already handled -`platform: linux/amd64` in docker-compose.yml
Network already exists error	Run `docker network rm firstaid-llm-rag-network` then retry
GKE cluster creation timeout	GKE Autopilot clusters take 10-15 minutes; check GCP Console
Image pull errors on GKE	Verify images are pushed:`gcloud artifacts docker images list ...`
LoadBalancer pending	External IP allocation takes 2-5 minutes

Debug Commands

# Check service health
curl http://localhost:5000/health
curl http://localhost:8100/health
curl http://localhost:8000/api/v2/heartbeat

# View Docker logs
docker compose logs -f frontend
docker compose logs -f emergency-classifier-api
docker compose logs -f chromadb

# Kubernetes debugging
kubectl get pods -n firstaid
kubectl describe pod -n firstaid <pod-name>
kubectl logs -n firstaid -l app=frontend
kubectl get events -n firstaid --sort-by='.lastTimestamp'

Team

FirstAiders - Harvard AC215 Fall 2025

Member	Responsibilities
Zhaocheng (Harry) Yang	RAG Architecture, Frontend, Voice Integration, Deployment
Ivan Gutierrez	Emergency Classifier, ML Training Pipeline, Frontend, Deployment
Sibo Zhou	CI/CD, Testing, Frontend, Data Versioning, Medium Post
Shupeng Luxu	Data Pipeline, Source Collection, Tutorial Video

Disclaimer

This application provides educational first-aid guidance only. It is NOT a substitute for professional medical care. In emergencies, always call 911 (US), 999 (UK), 112 (EU), or your local emergency number.

The information provided by FirstAid LLM:

Is for educational purposes only
Does not establish a doctor-patient relationship
May not be current or complete

Always seek professional medical attention for serious injuries or medical emergencies.

License

This project is developed for academic purposes as part of Harvard's AC215 course.

Additional Documentation

Deployment Guide - Detailed Kubernetes deployment
ML Training Workflow - Emergency classifier training pipeline
Test Coverage Analysis - Test coverage breakdown
Emergency Classifier - Classifier model details

Name		Name	Last commit message	Last commit date
Latest commit History 265 Commits
.github/workflows		.github/workflows
data		data
deployment		deployment
docs		docs
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

FirstAid LLM – Retrieval-Augmented First Aid Assistant

Table of Contents

Overview

Key Features

Prerequisites

Required Software

For Cloud Deployment (Optional)

Google Cloud Requirements

Environment Setup

Quick Start

Option 1: Local Development with Docker Compose (Recommended)

Option 2: Production Deployment (GKE)

Verify Installation

Deployment

Local Development Stack

Production Deployment (GKE Autopilot)

One-Click Deployment

Manual Deployment

Deployment Architecture

Access the Cluster

Teardown

Usage

Web Interface

Voice Input

API Endpoints

Chat API

Emergency Classifier API

CLI Usage

RAG Query (Answer with Sources)

Vector Search Only

Python API

Architecture

System Architecture

Data Flow

Technology Stack

Project Structure

CI/CD Pipeline

Pipeline Overview

CI Jobs (All Branches)

CD Jobs (Main Branch Only)

Test Coverage

Required Secrets

Known Issues and Limitations

Functional Limitations

Technical Limitations

Known Issues

Emergency Classifier Thresholds

Data Sources

Troubleshooting

Common Issues

Debug Commands

Team

Disclaimer

License

Additional Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages