Skip to content

firstaid-llm/Firstaid-LLM

Repository files navigation

FirstAid LLM – Retrieval-Augmented First Aid Assistant

A professional healthcare guidance application powered by Retrieval-Augmented Generation (RAG). This system transforms authoritative first-aid literature into a searchable, citation-backed assistant that provides evidence-based medical guidance.

Live Demo: https://firstaidllm.org

Note: Domain for live demo expires in a year on 12/10/2026. You can still access it at 34.135.73.43, but the voice input feature will no longer work, since it requires HTTPS.

Video: https://youtu.be/PpBqigQhSJI?si=RU83TSwYXWRvBZwZ

Developed for: AC215 (Advanced Practical Data Science) - Harvard University, Fall 2024


Table of Contents


Overview

FirstAid LLM is an AI-powered first-aid assistant that provides instant, evidence-based medical guidance. The system uses:

  • RAG (Retrieval-Augmented Generation) - Semantic search across 99+ authoritative medical sources
  • Google Gemini 2.0 Flash - LLM for composing natural language responses
  • Emergency Classifier - DistilBERT model to detect life-threatening emergencies
  • Voice Input - Browser-native Web Speech API for hands-free queries
  • ChromaDB - Vector database for fast semantic similarity search

Key Features

Feature Description
Evidence-Based Responses All answers cite sources from 99+ authoritative medical guidelines
Emergency Detection AI classifier identifies emergencies and prompts users to call 911
Voice Input Speak your question using browser-native voice recognition
Citation Tracking Every response shows source documents with relevance scores
Privacy First No conversation logging; queries are not stored

Prerequisites

Required Software

Software Version Purpose
Docker Engine 20.10+ Container runtime
Docker Compose 2.0+ Local orchestration
Python 3.12+ For local development
Git 2.0+ Version control

For Cloud Deployment (Optional)

Software Version Purpose
Google Cloud SDK Latest GCP authentication
Pulumi CLI 3.0+ Infrastructure as Code
kubectl 1.28+ Kubernetes management

Google Cloud Requirements

  1. GCP Project with the following APIs enabled:

    • Vertex AI API
    • Cloud Speech-to-Text API (optional, for server-side voice)
    • Kubernetes Engine API (for cloud deployment)
    • Artifact Registry API (for container images)
  2. Service Account with roles:

    • Vertex AI User
    • Storage Object Viewer (for training data)
    • Kubernetes Engine Developer (for deployment)

Environment Setup

# Enable required GCP APIs
gcloud services enable aiplatform.googleapis.com
gcloud services enable speech.googleapis.com
gcloud services enable container.googleapis.com
gcloud services enable artifactregistry.googleapis.com

Quick Start

Option 1: Local Development with Docker Compose (Recommended)

# 1. Clone the repository
git clone https://github.com/firstaid-llm/AC215_firstaid-llm.git
cd firstaid-llm

# 2. Set up credentials
mkdir -p secrets
cp /path/to/your/service-account-key.json ./secrets/llm-service-account.json

# 3. Start all services
docker compose up --build

# 4. Access the application
# Frontend: http://localhost:5000
# Emergency Classifier API: http://localhost:8100
# ChromaDB: http://localhost:8000

Option 2: Production Deployment (GKE)

# Deploy to Google Kubernetes Engine
cd deployment/scripts
./deploy.sh

Verify Installation

# Check service health
curl http://localhost:5000/health
curl http://localhost:8100/health
curl http://localhost:8000/api/v2/heartbeat

Deployment

Local Development Stack

The docker-compose.yml starts four services:

Service Port Description
frontend 5000 Flask web application
emergency-classifier-api 8100 Emergency detection API
chromadb 8000 Vector database
datapipeline - ETL pipeline (runs once)
# Start all services
docker compose up --build

# Start specific services
docker compose up frontend chromadb emergency-classifier-api

# Run data pipeline only
docker compose up datapipeline

# View logs
docker compose logs -f frontend

Production Deployment (GKE Autopilot)

The application deploys to Google Kubernetes Engine using Pulumi for Infrastructure as Code.

One-Click Deployment

cd deployment/scripts
./deploy.sh

This script will:

  1. Check prerequisites (Pulumi, gcloud, Docker)
  2. Create Artifact Registry repository
  3. Build and push Docker images
  4. Deploy GKE Autopilot cluster
  5. Deploy Kubernetes resources (Deployments, Services, HPAs)
  6. Output the public frontend URL

Manual Deployment

cd deployment/pulumi

# Create virtual environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Initialize and deploy
pulumi stack select dev
pulumi up --yes

# Get frontend URL
pulumi stack output frontend_url

Deployment Architecture

Component Type Replicas Access
Frontend (Flask) Deployment + HPA 2-6 LoadBalancer (Public)
Emergency Classifier Deployment + HPA 1-4 ClusterIP (Internal)
ChromaDB StatefulSet 1 ClusterIP (Internal)

Access the Cluster

# Export kubeconfig
cd deployment/pulumi
pulumi stack output kubeconfig --show-secrets > kubeconfig.yaml
export KUBECONFIG=$(pwd)/kubeconfig.yaml

# View resources
kubectl get pods -n firstaid
kubectl get svc -n firstaid
kubectl logs -n firstaid -l app=frontend

Teardown

cd deployment/scripts
./destroy.sh

For detailed deployment documentation, see deployment/pulumi/README.md.


Usage

Web Interface

  1. Navigate to the application: http://localhost:5000 (local) or your deployed URL
  2. Home Page: Enter your question in the search box or click an example query
  3. Assistant Page: Chat with the AI assistant; view sources and citations
  4. Sources Page: Browse the 99+ authoritative medical sources
  5. About Page: Learn about the technology and team

Voice Input

The application supports browser-native voice input:

  1. Click the microphone button on the home page or assistant page
  2. Speak your question clearly
  3. The transcript appears in the input field automatically

Supported browsers: Chrome, Edge, Safari (latest versions)

API Endpoints

Chat API

curl -X POST http://localhost:5000/api/chat \
  -H 'Content-Type: application/json' \
  -d '{"message": "How do I treat a minor burn?"}'

Response:

{
  "answer": "For a minor burn, you should...",
  "sources": [
    {"index": 1, "title": "ABA Burn First Aid", "relevance": "89.2%"}
  ],
  "retrieval_time": 0.234,
  "total_time": 1.456,
  "emergency": {"is_emergency": false, "confidence": 0.12}
}

Emergency Classifier API

curl -X POST http://localhost:8100/api/emergency \
  -H 'Content-Type: application/json' \
  -d '{"text": "I am having severe chest pain"}'

Response:

{
  "is_emergency": true,
  "label": "emergency",
  "confidence": 0.94
}

CLI Usage

RAG Query (Answer with Sources)

python src/datapipeline/postload_rag.py answer \
  -q "How to treat severe bleeding?" \
  --collection firstaid_guidelines \
  --k 8

Vector Search Only

python src/datapipeline/postload_rag.py search \
  -q "heat stroke first aid" \
  --n 10

Python API

import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "./secrets/llm-service-account.json"
os.environ["GOOGLE_CLOUD_PROJECT"] = "firstaid-llm-479200"

from src.models.retrieve import retrieve_topk
from src.models.compose import compose_answer

# Retrieve relevant chunks
result = retrieve_topk("firstaid_guidelines", "treat severe bleeding", k=8)

# Compose answer with LLM
answer = compose_answer(
    "How to treat severe bleeding?",
    result["documents"][0][:5],
    result["metadatas"][0][:5]
)
print(answer)

Architecture

System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              User Interface                                  │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    Flask Frontend (Port 5000)                        │   │
│  │  • Home Page          • Assistant (Chat)                             │   │
│  │  • Sources Page       • About Page                                   │   │
│  │  • Voice Input (Web Speech API)                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                              │                                               │
│                              ▼                                               │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      Backend Services                                │   │
│  │                                                                       │   │
│  │  ┌───────────────────┐    ┌───────────────────┐                     │   │
│  │  │ Emergency         │    │ ChromaDB          │                     │   │
│  │  │ Classifier API    │    │ Vector Database   │                     │   │
│  │  │ (DistilBERT)      │    │ (Port 8000)       │                     │   │
│  │  │ (Port 8100)       │    │                   │                     │   │
│  │  └───────────────────┘    └───────────────────┘                     │   │
│  │                                                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                              │                                               │
│                              ▼                                               │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    Google Cloud Services                             │   │
│  │                                                                       │   │
│  │  • Vertex AI (Gemini 2.0 Flash) - Answer composition                 │   │
│  │  • Vertex AI (text-embedding-004) - Document embeddings              │   │
│  │  • Cloud Storage - Training data versioning                          │   │
│  │                                                                       │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

Data Flow

┌──────────────────────────────────────────────────────────────────────────────┐
│ Data Pipeline                                                                 │
│                                                                               │
│ Sources (99+ PDFs/Web) → Preprocessing → Chunking → Embedding → ChromaDB     │
│                                                                               │
└──────────────────────────────────────────────────────────────────────────────┘
                                    ↓
┌──────────────────────────────────────────────────────────────────────────────┐
│ Query Processing                                                              │
│                                                                               │
│ User Query → Emergency Classifier ─┬─→ [Emergency? → 911 Modal]              │
│                                    │                                          │
│                                    └─→ Semantic Search → Gemini LLM           │
│                                                              ↓                │
│                                                    Cited Answer + Sources     │
└──────────────────────────────────────────────────────────────────────────────┘

Technology Stack

Layer Technology
Frontend Flask + Jinja2 + Vanilla JavaScript
Backend API Flask (Gunicorn) + FastAPI
LLM Google Gemini 2.0 Flash (via Vertex AI)
Embeddings text-embedding-004 (Vertex AI)
Vector Database ChromaDB
Emergency Classifier DistilBERT (fine-tuned)
Voice Input Web Speech API (browser-native)
Deployment Docker + Kubernetes (GKE Autopilot)
IaC Pulumi (Python)
CI/CD GitHub Actions

Project Structure

firstaid-llm/
├── docker-compose.yml              # Local orchestration
├── Dockerfile                      # Base CI image
├── pyproject.toml                  # Root dependencies
├── Makefile                        # Build shortcuts
├── README.md                       # This file
│
├── secrets/                        # GCP credentials (gitignored)
│   └── llm-service-account.json
│
├── .github/
│   └── workflows/
│       ├── ci.yml                  # CI/CD pipeline
│       └── ml-training.yml         # ML training workflow
│
├── deployment/                     # Kubernetes deployment
│   ├── pulumi/
│   │   ├── __main__.py             # Pulumi entry point
│   │   ├── config.py               # Configuration helpers
│   │   ├── gke_cluster.py          # GKE cluster definition
│   │   ├── artifact_registry.py    # Container registry
│   │   ├── k8s_resources/          # Kubernetes manifests
│   │   │   ├── namespace.py
│   │   │   ├── chromadb.py
│   │   │   ├── frontend.py
│   │   │   ├── classifier.py
│   │   │   ├── services.py
│   │   │   └── hpa.py              # Horizontal Pod Autoscalers
│   │   └── README.md
│   └── scripts/
│       ├── deploy.sh               # One-click deploy
│       ├── destroy.sh              # Teardown
│       └── push-images.sh          # Build & push images
│
├── src/
│   ├── datapipeline/               # ETL pipeline
│   │   ├── Dockerfile
│   │   ├── run_pipeline.sh
│   │   ├── preprocess_rag.py       # Fetch & chunk documents
│   │   ├── embed_index.py          # Generate embeddings
│   │   ├── load_embeddings.py      # Load to ChromaDB
│   │   ├── postload_rag.py         # CLI query interface
│   │   └── data/data.yaml          # 99+ source definitions
│   │
│   ├── frontend/                   # Flask web application
│   │   ├── Dockerfile
│   │   ├── app_flask.py            # Main Flask app
│   │   ├── templates/              # Jinja2 templates
│   │   │   ├── base.html
│   │   │   ├── home.html
│   │   │   ├── assistant.html
│   │   │   ├── sources.html
│   │   │   └── about.html
│   │   ├── static/
│   │   │   ├── css/main.css
│   │   │   ├── js/main.js
│   │   │   └── assets/
│   │   └── pyproject.toml
│   │
│   └── models/
│       ├── Dockerfile
│       ├── retrieve.py             # ChromaDB retrieval
│       ├── compose.py              # Gemini answer composer
│       └── emergency_classifier/   # Emergency detection model
│           ├── Dockerfile          # Multi-stage (api/training)
│           ├── api.py              # FastAPI endpoint
│           ├── inference.py        # Classification logic
│           ├── train.py            # Model training
│           └── README.md
│
├── tests/
│   ├── unit/                       # Unit tests
│   ├── integration/                # Integration tests
│   ├── frontend/                   # Frontend tests
│   └── system/                     # System tests
│
└── docs/
    ├── ML_WORKFLOW.md              # ML training pipeline
    ├── TEST_COVERAGE.md            # Test coverage analysis
    └── application_design/

CI/CD Pipeline

Pipeline Overview

Push/PR → Build → Lint → Tests → [main branch only] → Deploy to GKE

CI Jobs (All Branches)

Job Description Time
build Build Docker images, push to GHCR ~3 min
lint-and-format Black formatter, Flake8 linter ~1 min
unit-tests Unit tests with coverage ~2 min
integration-tests Integration tests ~2 min
frontend-tests Frontend component tests ~1 min
chroma-system-tests ChromaDB integration ~2 min
emergency-classifier-system-tests Classifier API tests ~3 min

CD Jobs (Main Branch Only)

Job Description
deploy-images Build and push to GCP Artifact Registry
deploy-gke Deploy to GKE using Pulumi

Test Coverage

Module Coverage Status
datapipeline/chunk_data.py 60% Checked
datapipeline/clean_chunks.py 73% Checked
datapipeline/utils.py 76% Checked
models/compose.py 95% Checked
models/emergency_classifier/dataset.py 98% Checked
Overall 78% Exceeds 60% requirement

Required Secrets

Secret Purpose
FIRSTAID_LLM GCP service account JSON
PULUMI_ACCESS_TOKEN Pulumi Cloud authentication
SLACK_WEBHOOK_URL (Optional) Slack notifications

Known Issues and Limitations

Functional Limitations

Limitation Description Workaround
No Image Analysis Cannot analyze photos of injuries Describe the injury in text
No Real-Time Updates Medical guidelines require manual updates Run data pipeline periodically
Session-Based Chat Chat history is lost on page refresh Export chat before leaving

Technical Limitations

Issue Description Status
Apple Silicon Docker Emergency classifier requires platform: linux/amd64 Handled in docker-compose.yml
Cold Start Latency First query may take 5-10 seconds Model caching after first request
Voice Input Browser Support Web Speech API not supported in Firefox Use Chrome, Edge, or Safari
ChromaDB Single Instance No horizontal scaling for vector DB Sufficient for current load

Known Issues

Bug Description Status
Voice button may not reset After voice error, button may stay in "stop" state Refresh page to reset

Emergency Classifier Thresholds

Confidence Level Behavior
≥ 70% Show full-screen emergency modal with 911 guidance
40-70% Show inline warning banner
< 40% No emergency indication

Note: The classifier may produce false positives for phrases like "I'm not having an emergency" due to keyword detection. This is mitigated by the confidence threshold.

Data Sources

  • Total Sources: 99+ authoritative medical guidelines
  • Organizations: ILCOR, IFRC, WHO, AHA, American Red Cross, CDC, NHS, and more
  • Last Updated: Check src/datapipeline/data/data.yaml for source list

Troubleshooting

Common Issues

Issue Solution
Invalid JWT signature Verify ./secrets/llm-service-account.json matches your GCP project
403 PERMISSION_DENIED on Vertex AI Enable Vertex AI API and grant Vertex AI User role to service account
403/404 during preprocessing Some URLs block automation; check _preprocessing_stats.json
ChromaDB healthcheck fails Ensure port 8000 is free:lsof -i :8000
Voice transcription fails Use Chrome/Edge/Safari; Firefox doesn't support Web Speech API
PyTorch architecture error (Apple Silicon) Already handled -platform: linux/amd64 in docker-compose.yml
Network already exists error Run docker network rm firstaid-llm-rag-network then retry
GKE cluster creation timeout GKE Autopilot clusters take 10-15 minutes; check GCP Console
Image pull errors on GKE Verify images are pushed:gcloud artifacts docker images list ...
LoadBalancer pending External IP allocation takes 2-5 minutes

Debug Commands

# Check service health
curl http://localhost:5000/health
curl http://localhost:8100/health
curl http://localhost:8000/api/v2/heartbeat

# View Docker logs
docker compose logs -f frontend
docker compose logs -f emergency-classifier-api
docker compose logs -f chromadb

# Kubernetes debugging
kubectl get pods -n firstaid
kubectl describe pod -n firstaid <pod-name>
kubectl logs -n firstaid -l app=frontend
kubectl get events -n firstaid --sort-by='.lastTimestamp'

Team

FirstAiders - Harvard AC215 Fall 2025

Member Responsibilities
Zhaocheng (Harry) Yang RAG Architecture, Frontend, Voice Integration, Deployment
Ivan Gutierrez Emergency Classifier, ML Training Pipeline, Frontend, Deployment
Sibo Zhou CI/CD, Testing, Frontend, Data Versioning, Medium Post
Shupeng Luxu Data Pipeline, Source Collection, Tutorial Video

Disclaimer

This application provides educational first-aid guidance only. It is NOT a substitute for professional medical care. In emergencies, always call 911 (US), 999 (UK), 112 (EU), or your local emergency number.

The information provided by FirstAid LLM:

  • Is for educational purposes only
  • Does not establish a doctor-patient relationship
  • May not be current or complete

Always seek professional medical attention for serious injuries or medical emergencies.


License

This project is developed for academic purposes as part of Harvard's AC215 course.


Additional Documentation

About

A Guided First Aid Chat Agent Based on Authoritative Guidelines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors