A professional healthcare guidance application powered by Retrieval-Augmented Generation (RAG). This system transforms authoritative first-aid literature into a searchable, citation-backed assistant that provides evidence-based medical guidance.
Live Demo: https://firstaidllm.org
Note: Domain for live demo expires in a year on 12/10/2026. You can still access it at 34.135.73.43, but the voice input feature will no longer work, since it requires HTTPS.
Video: https://youtu.be/PpBqigQhSJI?si=RU83TSwYXWRvBZwZ
Developed for: AC215 (Advanced Practical Data Science) - Harvard University, Fall 2024
- Overview
- Prerequisites
- Quick Start
- Deployment
- Usage
- Architecture
- Project Structure
- CI/CD Pipeline
- Known Issues and Limitations
- Troubleshooting
- Team
- License
FirstAid LLM is an AI-powered first-aid assistant that provides instant, evidence-based medical guidance. The system uses:
- RAG (Retrieval-Augmented Generation) - Semantic search across 99+ authoritative medical sources
- Google Gemini 2.0 Flash - LLM for composing natural language responses
- Emergency Classifier - DistilBERT model to detect life-threatening emergencies
- Voice Input - Browser-native Web Speech API for hands-free queries
- ChromaDB - Vector database for fast semantic similarity search
| Feature | Description |
|---|---|
| Evidence-Based Responses | All answers cite sources from 99+ authoritative medical guidelines |
| Emergency Detection | AI classifier identifies emergencies and prompts users to call 911 |
| Voice Input | Speak your question using browser-native voice recognition |
| Citation Tracking | Every response shows source documents with relevance scores |
| Privacy First | No conversation logging; queries are not stored |
| Software | Version | Purpose |
|---|---|---|
| Docker Engine | 20.10+ | Container runtime |
| Docker Compose | 2.0+ | Local orchestration |
| Python | 3.12+ | For local development |
| Git | 2.0+ | Version control |
| Software | Version | Purpose |
|---|---|---|
| Google Cloud SDK | Latest | GCP authentication |
| Pulumi CLI | 3.0+ | Infrastructure as Code |
| kubectl | 1.28+ | Kubernetes management |
-
GCP Project with the following APIs enabled:
- Vertex AI API
- Cloud Speech-to-Text API (optional, for server-side voice)
- Kubernetes Engine API (for cloud deployment)
- Artifact Registry API (for container images)
-
Service Account with roles:
Vertex AI UserStorage Object Viewer(for training data)Kubernetes Engine Developer(for deployment)
# Enable required GCP APIs
gcloud services enable aiplatform.googleapis.com
gcloud services enable speech.googleapis.com
gcloud services enable container.googleapis.com
gcloud services enable artifactregistry.googleapis.com# 1. Clone the repository
git clone https://github.com/firstaid-llm/AC215_firstaid-llm.git
cd firstaid-llm
# 2. Set up credentials
mkdir -p secrets
cp /path/to/your/service-account-key.json ./secrets/llm-service-account.json
# 3. Start all services
docker compose up --build
# 4. Access the application
# Frontend: http://localhost:5000
# Emergency Classifier API: http://localhost:8100
# ChromaDB: http://localhost:8000# Deploy to Google Kubernetes Engine
cd deployment/scripts
./deploy.sh# Check service health
curl http://localhost:5000/health
curl http://localhost:8100/health
curl http://localhost:8000/api/v2/heartbeatThe docker-compose.yml starts four services:
| Service | Port | Description |
|---|---|---|
frontend |
5000 | Flask web application |
emergency-classifier-api |
8100 | Emergency detection API |
chromadb |
8000 | Vector database |
datapipeline |
- | ETL pipeline (runs once) |
# Start all services
docker compose up --build
# Start specific services
docker compose up frontend chromadb emergency-classifier-api
# Run data pipeline only
docker compose up datapipeline
# View logs
docker compose logs -f frontendThe application deploys to Google Kubernetes Engine using Pulumi for Infrastructure as Code.
cd deployment/scripts
./deploy.shThis script will:
- Check prerequisites (Pulumi, gcloud, Docker)
- Create Artifact Registry repository
- Build and push Docker images
- Deploy GKE Autopilot cluster
- Deploy Kubernetes resources (Deployments, Services, HPAs)
- Output the public frontend URL
cd deployment/pulumi
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Initialize and deploy
pulumi stack select dev
pulumi up --yes
# Get frontend URL
pulumi stack output frontend_url| Component | Type | Replicas | Access |
|---|---|---|---|
| Frontend (Flask) | Deployment + HPA | 2-6 | LoadBalancer (Public) |
| Emergency Classifier | Deployment + HPA | 1-4 | ClusterIP (Internal) |
| ChromaDB | StatefulSet | 1 | ClusterIP (Internal) |
# Export kubeconfig
cd deployment/pulumi
pulumi stack output kubeconfig --show-secrets > kubeconfig.yaml
export KUBECONFIG=$(pwd)/kubeconfig.yaml
# View resources
kubectl get pods -n firstaid
kubectl get svc -n firstaid
kubectl logs -n firstaid -l app=frontendcd deployment/scripts
./destroy.shFor detailed deployment documentation, see deployment/pulumi/README.md.
- Navigate to the application:
http://localhost:5000(local) or your deployed URL - Home Page: Enter your question in the search box or click an example query
- Assistant Page: Chat with the AI assistant; view sources and citations
- Sources Page: Browse the 99+ authoritative medical sources
- About Page: Learn about the technology and team
The application supports browser-native voice input:
- Click the microphone button on the home page or assistant page
- Speak your question clearly
- The transcript appears in the input field automatically
Supported browsers: Chrome, Edge, Safari (latest versions)
curl -X POST http://localhost:5000/api/chat \
-H 'Content-Type: application/json' \
-d '{"message": "How do I treat a minor burn?"}'Response:
{
"answer": "For a minor burn, you should...",
"sources": [
{"index": 1, "title": "ABA Burn First Aid", "relevance": "89.2%"}
],
"retrieval_time": 0.234,
"total_time": 1.456,
"emergency": {"is_emergency": false, "confidence": 0.12}
}curl -X POST http://localhost:8100/api/emergency \
-H 'Content-Type: application/json' \
-d '{"text": "I am having severe chest pain"}'Response:
{
"is_emergency": true,
"label": "emergency",
"confidence": 0.94
}python src/datapipeline/postload_rag.py answer \
-q "How to treat severe bleeding?" \
--collection firstaid_guidelines \
--k 8python src/datapipeline/postload_rag.py search \
-q "heat stroke first aid" \
--n 10import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "./secrets/llm-service-account.json"
os.environ["GOOGLE_CLOUD_PROJECT"] = "firstaid-llm-479200"
from src.models.retrieve import retrieve_topk
from src.models.compose import compose_answer
# Retrieve relevant chunks
result = retrieve_topk("firstaid_guidelines", "treat severe bleeding", k=8)
# Compose answer with LLM
answer = compose_answer(
"How to treat severe bleeding?",
result["documents"][0][:5],
result["metadatas"][0][:5]
)
print(answer)┌─────────────────────────────────────────────────────────────────────────────┐
│ User Interface │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Flask Frontend (Port 5000) │ │
│ │ • Home Page • Assistant (Chat) │ │
│ │ • Sources Page • About Page │ │
│ │ • Voice Input (Web Speech API) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Backend Services │ │
│ │ │ │
│ │ ┌───────────────────┐ ┌───────────────────┐ │ │
│ │ │ Emergency │ │ ChromaDB │ │ │
│ │ │ Classifier API │ │ Vector Database │ │ │
│ │ │ (DistilBERT) │ │ (Port 8000) │ │ │
│ │ │ (Port 8100) │ │ │ │ │
│ │ └───────────────────┘ └───────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Google Cloud Services │ │
│ │ │ │
│ │ • Vertex AI (Gemini 2.0 Flash) - Answer composition │ │
│ │ • Vertex AI (text-embedding-004) - Document embeddings │ │
│ │ • Cloud Storage - Training data versioning │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────┐
│ Data Pipeline │
│ │
│ Sources (99+ PDFs/Web) → Preprocessing → Chunking → Embedding → ChromaDB │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────────────────┐
│ Query Processing │
│ │
│ User Query → Emergency Classifier ─┬─→ [Emergency? → 911 Modal] │
│ │ │
│ └─→ Semantic Search → Gemini LLM │
│ ↓ │
│ Cited Answer + Sources │
└──────────────────────────────────────────────────────────────────────────────┘
| Layer | Technology |
|---|---|
| Frontend | Flask + Jinja2 + Vanilla JavaScript |
| Backend API | Flask (Gunicorn) + FastAPI |
| LLM | Google Gemini 2.0 Flash (via Vertex AI) |
| Embeddings | text-embedding-004 (Vertex AI) |
| Vector Database | ChromaDB |
| Emergency Classifier | DistilBERT (fine-tuned) |
| Voice Input | Web Speech API (browser-native) |
| Deployment | Docker + Kubernetes (GKE Autopilot) |
| IaC | Pulumi (Python) |
| CI/CD | GitHub Actions |
firstaid-llm/
├── docker-compose.yml # Local orchestration
├── Dockerfile # Base CI image
├── pyproject.toml # Root dependencies
├── Makefile # Build shortcuts
├── README.md # This file
│
├── secrets/ # GCP credentials (gitignored)
│ └── llm-service-account.json
│
├── .github/
│ └── workflows/
│ ├── ci.yml # CI/CD pipeline
│ └── ml-training.yml # ML training workflow
│
├── deployment/ # Kubernetes deployment
│ ├── pulumi/
│ │ ├── __main__.py # Pulumi entry point
│ │ ├── config.py # Configuration helpers
│ │ ├── gke_cluster.py # GKE cluster definition
│ │ ├── artifact_registry.py # Container registry
│ │ ├── k8s_resources/ # Kubernetes manifests
│ │ │ ├── namespace.py
│ │ │ ├── chromadb.py
│ │ │ ├── frontend.py
│ │ │ ├── classifier.py
│ │ │ ├── services.py
│ │ │ └── hpa.py # Horizontal Pod Autoscalers
│ │ └── README.md
│ └── scripts/
│ ├── deploy.sh # One-click deploy
│ ├── destroy.sh # Teardown
│ └── push-images.sh # Build & push images
│
├── src/
│ ├── datapipeline/ # ETL pipeline
│ │ ├── Dockerfile
│ │ ├── run_pipeline.sh
│ │ ├── preprocess_rag.py # Fetch & chunk documents
│ │ ├── embed_index.py # Generate embeddings
│ │ ├── load_embeddings.py # Load to ChromaDB
│ │ ├── postload_rag.py # CLI query interface
│ │ └── data/data.yaml # 99+ source definitions
│ │
│ ├── frontend/ # Flask web application
│ │ ├── Dockerfile
│ │ ├── app_flask.py # Main Flask app
│ │ ├── templates/ # Jinja2 templates
│ │ │ ├── base.html
│ │ │ ├── home.html
│ │ │ ├── assistant.html
│ │ │ ├── sources.html
│ │ │ └── about.html
│ │ ├── static/
│ │ │ ├── css/main.css
│ │ │ ├── js/main.js
│ │ │ └── assets/
│ │ └── pyproject.toml
│ │
│ └── models/
│ ├── Dockerfile
│ ├── retrieve.py # ChromaDB retrieval
│ ├── compose.py # Gemini answer composer
│ └── emergency_classifier/ # Emergency detection model
│ ├── Dockerfile # Multi-stage (api/training)
│ ├── api.py # FastAPI endpoint
│ ├── inference.py # Classification logic
│ ├── train.py # Model training
│ └── README.md
│
├── tests/
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ ├── frontend/ # Frontend tests
│ └── system/ # System tests
│
└── docs/
├── ML_WORKFLOW.md # ML training pipeline
├── TEST_COVERAGE.md # Test coverage analysis
└── application_design/
Push/PR → Build → Lint → Tests → [main branch only] → Deploy to GKE
| Job | Description | Time |
|---|---|---|
build |
Build Docker images, push to GHCR | ~3 min |
lint-and-format |
Black formatter, Flake8 linter | ~1 min |
unit-tests |
Unit tests with coverage | ~2 min |
integration-tests |
Integration tests | ~2 min |
frontend-tests |
Frontend component tests | ~1 min |
chroma-system-tests |
ChromaDB integration | ~2 min |
emergency-classifier-system-tests |
Classifier API tests | ~3 min |
| Job | Description |
|---|---|
deploy-images |
Build and push to GCP Artifact Registry |
deploy-gke |
Deploy to GKE using Pulumi |
| Module | Coverage | Status |
|---|---|---|
datapipeline/chunk_data.py |
60% | Checked |
datapipeline/clean_chunks.py |
73% | Checked |
datapipeline/utils.py |
76% | Checked |
models/compose.py |
95% | Checked |
models/emergency_classifier/dataset.py |
98% | Checked |
| Overall | 78% | Exceeds 60% requirement |
| Secret | Purpose |
|---|---|
FIRSTAID_LLM |
GCP service account JSON |
PULUMI_ACCESS_TOKEN |
Pulumi Cloud authentication |
SLACK_WEBHOOK_URL |
(Optional) Slack notifications |
| Limitation | Description | Workaround |
|---|---|---|
| No Image Analysis | Cannot analyze photos of injuries | Describe the injury in text |
| No Real-Time Updates | Medical guidelines require manual updates | Run data pipeline periodically |
| Session-Based Chat | Chat history is lost on page refresh | Export chat before leaving |
| Issue | Description | Status |
|---|---|---|
| Apple Silicon Docker | Emergency classifier requires platform: linux/amd64 |
Handled in docker-compose.yml |
| Cold Start Latency | First query may take 5-10 seconds | Model caching after first request |
| Voice Input Browser Support | Web Speech API not supported in Firefox | Use Chrome, Edge, or Safari |
| ChromaDB Single Instance | No horizontal scaling for vector DB | Sufficient for current load |
| Bug | Description | Status |
|---|---|---|
| Voice button may not reset | After voice error, button may stay in "stop" state | Refresh page to reset |
| Confidence Level | Behavior |
|---|---|
| ≥ 70% | Show full-screen emergency modal with 911 guidance |
| 40-70% | Show inline warning banner |
| < 40% | No emergency indication |
Note: The classifier may produce false positives for phrases like "I'm not having an emergency" due to keyword detection. This is mitigated by the confidence threshold.
- Total Sources: 99+ authoritative medical guidelines
- Organizations: ILCOR, IFRC, WHO, AHA, American Red Cross, CDC, NHS, and more
- Last Updated: Check
src/datapipeline/data/data.yamlfor source list
| Issue | Solution |
|---|---|
| Invalid JWT signature | Verify ./secrets/llm-service-account.json matches your GCP project |
| 403 PERMISSION_DENIED on Vertex AI | Enable Vertex AI API and grant Vertex AI User role to service account |
| 403/404 during preprocessing | Some URLs block automation; check _preprocessing_stats.json |
| ChromaDB healthcheck fails | Ensure port 8000 is free:lsof -i :8000 |
| Voice transcription fails | Use Chrome/Edge/Safari; Firefox doesn't support Web Speech API |
| PyTorch architecture error (Apple Silicon) | Already handled -platform: linux/amd64 in docker-compose.yml |
| Network already exists error | Run docker network rm firstaid-llm-rag-network then retry |
| GKE cluster creation timeout | GKE Autopilot clusters take 10-15 minutes; check GCP Console |
| Image pull errors on GKE | Verify images are pushed:gcloud artifacts docker images list ... |
| LoadBalancer pending | External IP allocation takes 2-5 minutes |
# Check service health
curl http://localhost:5000/health
curl http://localhost:8100/health
curl http://localhost:8000/api/v2/heartbeat
# View Docker logs
docker compose logs -f frontend
docker compose logs -f emergency-classifier-api
docker compose logs -f chromadb
# Kubernetes debugging
kubectl get pods -n firstaid
kubectl describe pod -n firstaid <pod-name>
kubectl logs -n firstaid -l app=frontend
kubectl get events -n firstaid --sort-by='.lastTimestamp'FirstAiders - Harvard AC215 Fall 2025
| Member | Responsibilities |
|---|---|
| Zhaocheng (Harry) Yang | RAG Architecture, Frontend, Voice Integration, Deployment |
| Ivan Gutierrez | Emergency Classifier, ML Training Pipeline, Frontend, Deployment |
| Sibo Zhou | CI/CD, Testing, Frontend, Data Versioning, Medium Post |
| Shupeng Luxu | Data Pipeline, Source Collection, Tutorial Video |
This application provides educational first-aid guidance only. It is NOT a substitute for professional medical care. In emergencies, always call 911 (US), 999 (UK), 112 (EU), or your local emergency number.
The information provided by FirstAid LLM:
- Is for educational purposes only
- Does not establish a doctor-patient relationship
- May not be current or complete
Always seek professional medical attention for serious injuries or medical emergencies.
This project is developed for academic purposes as part of Harvard's AC215 course.
- Deployment Guide - Detailed Kubernetes deployment
- ML Training Workflow - Emergency classifier training pipeline
- Test Coverage Analysis - Test coverage breakdown
- Emergency Classifier - Classifier model details