Real-Time Drug Safety Intelligence via Hybrid RAG
A production-style REST API that automates drug safety research for pharmacovigilance analysts. It combines live FDA adverse event data, real-time PubMed literature, and a pre-built biomedical vector knowledge base to deliver AI-synthesized drug safety assessments in seconds — replacing a process that traditionally takes 4–8 hours.
- 1. Problem Statement
- 2. Our Solution
- 3. How It Works — System Architecture
- 4. What We Built — Actions Taken
- 5. Results & Validation
- 6. Technical Stack — Why Each Tool Was Chosen
- 7. Getting Started
- 8. API Endpoints
- 9. Deployment
- 10. Future Scope
Every drug that reaches the market must be continuously monitored for unexpected side effects — a regulated practice called pharmacovigilance. When a safety analyst suspects that a drug is causing a new adverse reaction, the investigation requires:
- Searching the FDA's FAERS database (millions of raw adverse event reports) for statistical patterns
- Cross-referencing those patterns with peer-reviewed medical literature on PubMed
- Writing a structured safety assessment that distinguishes confirmed signals from noise
This process is performed entirely by hand. For a single drug query, a trained analyst typically spends 4 to 8 hours pulling numbers from one system, abstracts from another, and then synthesizing it all into a coherent report.
Three specific problems make this workflow painful:
- Data fragmentation: FDA event data and PubMed literature exist in completely separate, unconnected systems with different query interfaces.
- Static knowledge goes stale: Drug safety data changes weekly. A pre-built knowledge base without live data is unreliable for current signal detection.
- Context without synthesis is useless: Knowing "47 heart failure reports were filed" means nothing without the clinical context to interpret whether that count is elevated relative to the drug's mechanism and patient population.
MedSignal is an API that automates the entire pharmacovigilance research workflow into a single HTTP request.
A user sends a natural language question — for example, "What cardiac adverse events have been reported for semaglutide in the last 6 months in patients over 65?" — and receives back:
- Live FDA adverse event statistics (reaction counts, outcome severity, demographic breakdown)
- Relevant PubMed abstracts from the last indexed period
- Synthesized passages from a pre-built biomedical knowledge base
- A structured AI-generated safety assessment with confidence score and formatted citations
The core insight is Hybrid RAG (Retrieval-Augmented Generation): instead of relying solely on a pre-built static index (which goes stale) or solely on live APIs (which are unstructured), we fan out across three sources simultaneously and merge the results before handing them to the language model. This ensures the LLM always has both current data and deep pre-indexed biomedical context.
User / Analyst
│
│ POST /api/v1/query
▼
┌────────────────────────────────────────────────────────┐
│ FastAPI Gateway │
│ (Pydantic validation → drug whitelist + date │
│ format + chronology checks before any API call) │
└──────────┬─────────────────────────────────────────────┘
│
│ asyncio.gather() — all three run in parallel
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Parallel Retrieval Fan-Out │
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ ┌────────────────────┐│
│ │ openFDA API │ │ PubMed E-utilities │ │ FAISS Vector DB ││
│ │ (Live) │ │ (Live) │ │ (Static Index) ││
│ │ │ │ │ │ ││
│ │ - Total report count│ │ - MeSH-term search │ │ - ~1,500 embedded ││
│ │ - Top reactions │ │ - Abstracts (up to │ │ PubMed abstracts││
│ │ - Outcome severity │ │ 5 most relevant) │ │ per drug ││
│ │ - Sex demographics │ │ - PMID + pub date │ │ - Cosine sim. ≥ ││
│ │ - Optional filters: │ │ - XML parsed │ │ 0.45 threshold ││
│ │ date_range, │ │ server-side │ │ - S-PubMedBert ││
│ │ age_group │ │ │ │ embeddings ││
│ └─────────────────────┘ └─────────────────────┘ └────────────────────┘│
└───────────────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────┐
│ Context Merger │
│ │
│ - Deduplicates by │
│ PMID across all │
│ three sources │
│ - Truncates long │
│ abstracts to │
│ preserve context │
│ window budget │
│ - Formats as │
│ labeled sections │
└──────────┬──────────┘
│
▼
┌──────────────────────────────┐
│ Groq LLM (Llama 3.3-70B) │
│ │
│ System prompt enforces: │
│ - Evidence-only responses │
│ - Structured output format │
│ - Confidence score (0–1) │
│ - No hallucinated citations │
└──────────────┬───────────────┘
│
▼
JSON Response with:
- synthesized_assessment
- adverse_events (counts + stats)
- literature_context (articles used)
- citations (formatted references)
- confidence_score
- metadata (latency, sources used)
Before the API can serve requests, a one-time offline pipeline builds the static knowledge base:
fetch_pubmed.py: Queries PubMed E-utilities for up to 1,500 abstracts per target drug using relevance-sorted search. Fetches in batches of 50 (to respect URL length limits), with exponential backoff retries viatenacityand rate limiting (0.35–0.5s between requests to respect NCBI's API policy).build_index.py: Loads the fetched abstracts, embeds each document usingS-PubMedBert-MS-MARCO(a biomedical-domain sentence transformer), and indexes all embeddings into a FAISSIndexFlatIP(inner product = cosine similarity on normalized vectors). The index and a JSON metadata file are written to disk and loaded into memory at API startup.
- Three retrieval services (
openfda.py,pubmed.py,vector_store.py) are each isolated modules with their own async clients and error handling. The API usesasyncio.gather()to hit all three concurrently — no request waits on another. context_merger.pydeduplicates results by PMID across live PubMed and static FAISS results (so the LLM never sees the same study twice), truncates long abstracts to stay within LLM context budget, and formats everything as labeled sections for the prompt.llm.pysends the merged context to Groq with a strict system prompt that forbids hallucination, requires structured output, and instructs the model to end every response with a parseableCONFIDENCE_SCORE:line.- Pydantic schemas validate every incoming request: drug names are whitelisted against a configured list; date ranges must be in
YYYYMMDD+TO+YYYYMMDDformat with chronological validity enforced before any downstream API call is made. RequestLoggingMiddlewareattaches a UUID request ID to every request and logs structured JSON with method, path, status code, and latency in milliseconds.
An adversarial test suite of 14 edge cases was designed to probe the system's failure modes before deployment, including paradoxical queries, hallucination bait, injection attempts, invalid inputs, and excessive query lengths.
The 14-case adversarial test suite produced the following verified outcomes:
| Test Category | Test Case | Result |
|---|---|---|
| Input Validation | Unsupported drug (ibuprofen) |
422 rejected in 0.01s |
| Input Validation | Made-up drug name | 422 rejected in 0.00s |
| Input Validation | Invalid date range (future start date) | 422 rejected in 0.01s |
| Hallucination Resistance | "Does metformin cause neon green hair and levitation?" | LLM confirmed zero FDA evidence; cited real GI reactions |
| Paradoxical Query | Asking if a weight-loss drug causes uncontrollable weight gain | LLM cited 360 "weight decreased" reports to correctly refute the premise |
| Off-topic Query | "Can metformin make me better at chess?" | LLM returned factual safety profile; explicitly stated no evidence for cognitive enhancement |
| SQL Injection | '; DROP TABLE users; -- injected into query field |
Request processed safely; injection treated as plain text with no code execution |
| Gibberish Input | Random characters as query | System returned a coherent safety profile; ignored unintelligible query |
| Brand Name | ozempic (brand name for semaglutide) |
422 — highlights a known limitation for future brand-name normalization |
| Complex Clinical Query | Comorbidity + B12 deficiency question | All 3 sources engaged; structured clinical response returned |
| Signal Report | Full pharmacovigilance report requested | 7-section formal report generated with executive summary and risk characterization |
| Typical Response Time | Standard query (first 4 test cases before rate limit) | 3.73s – 5.87s end-to-end including 3 parallel API calls + LLM synthesis |
Key takeaways:
- Input guardrails blocked 100% of invalid drug names and malformed date ranges before any API credits were spent
- The LLM remained grounded in the provided context for all hallucination-bait questions
- Parallel retrieval keeps end-to-end latency under 6 seconds for clean queries
| Technology | Role | Why This Choice |
|---|---|---|
| FastAPI | Web framework & API gateway | Native async/await support is essential here — the core architecture depends on firing three external API calls in parallel. FastAPI's asyncio.gather() integration and automatic OpenAPI docs made it the right fit over Flask or Django. |
| httpx | Async HTTP client | Unlike requests (which is synchronous), httpx supports async natively and integrates directly with FastAPI's event loop. This is what makes the parallel fan-out possible without threading overhead. |
| FAISS (Facebook AI Similarity Search) | Vector database for semantic retrieval | FAISS is the industry standard for high-performance similarity search. For a knowledge base of ~1,500–3,000 medical abstracts, IndexFlatIP with normalized vectors gives exact cosine similarity results with microsecond query latency — no approximate-search tradeoff needed at this scale. |
| S-PubMedBert-MS-MARCO | Sentence embedding model | This model was pre-trained on biomedical PubMed corpora and fine-tuned for semantic retrieval on MS MARCO. It significantly outperforms general-purpose embeddings (like all-MiniLM) on medical vocabulary because it understands terminology like "medullary thyroid carcinoma" or "GLP-1 receptor agonist" rather than treating them as rare tokens. |
| Groq API (Llama 3.3 70B) | LLM for synthesis | Groq's custom LPU (Language Processing Unit) hardware delivers Llama 3.3-70B inference at speeds that allow complete assessment generation in ~1–3 seconds. This is 10–20x faster than running the same model on standard GPU APIs, which is critical for keeping the total query latency under 6 seconds. |
| Pydantic v2 | Request validation & schema enforcement | The whitelist validator on drug_name and the chronological validator on date_range act as a security and cost-control firewall — invalid requests are rejected at the schema layer before any external API credits are spent. |
| Tenacity | Retry logic for ingestion | The offline PubMed ingestion pipeline fetches thousands of records across hundreds of batch requests. tenacity's exponential backoff decorator handles transient network errors without crashing the entire ingestion job. |
| Docker | Containerization | The FAISS index and sentence transformer model (~440MB) must be packaged with the application. Docker ensures the full runtime — Python version, system libraries, model weights, and index files — is reproducible across environments and deployable to any container platform. |
- Python 3.10+
- A free Groq API key
- (Optional) API keys for openFDA and NCBI — the API works without them but at lower rate limits
# 1. Clone the repository
git clone https://github.com/nikhilreddy00/MedSignal-API.git
cd MedSignal-API
# 2. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Create your .env file
cat > .env << EOF
GROQ_API_KEY=gsk_your_key_here
OPENFDA_API_KEY= # optional
NCBI_API_KEY= # optional
LOG_LEVEL=INFO
EOF# Step 1: Fetch PubMed abstracts (~2–5 minutes depending on API key)
python -m app.ingestion.fetch_pubmed
# Step 2: Embed and index into FAISS (~3–10 minutes, downloads ~440MB model on first run)
python -m app.ingestion.build_indexuvicorn app.main:app --reload --port 8000Navigate to http://localhost:8000/docs for the interactive Swagger UI.
Submit a natural language pharmacovigilance question.
Request body:
{
"drug_name": "semaglutide",
"query": "What cardiac adverse events have been reported in patients over 65?",
"date_range": "20240101+TO+20241231",
"age_group": "65+"
}Response includes: synthesized_assessment, adverse_events (counts + severity + demographics), literature_context (articles used), citations, confidence_score, and metadata (latency, sources used, models).
Generate a 7-section pharmacovigilance signal report (Executive Summary → Signal Description → Adverse Event Analysis → Literature Review → Risk Characterization → Recommendations → Data Sources).
Request body:
{
"drug_name": "semaglutide",
"report_type": "comprehensive"
}Supported report_type values: comprehensive, cardiac, hepatic, renal.
Returns connectivity status for all three external dependencies (openFDA, PubMed, Groq), vector store load status, document count, and active model names.
If scaled to a production pharmacovigilance environment:
- Automated re-indexing: Nightly CI/CD pipeline to re-fetch the latest PubMed publications and rebuild the FAISS index, so the static knowledge base stays within days of current literature.
- Brand-name normalization: Map trade names (e.g., "Ozempic", "Wegovy") to their generic INN equivalents before validation, rather than rejecting them.
- RAGAS evaluation: Automated context relevance and answer faithfulness scoring on a held-out test set after every index rebuild (target: >0.92 faithfulness).
- Expanded drug coverage: Extending
TARGET_DRUGSbeyond the current PoC scope of semaglutide and metformin to cover full therapeutic classes. - EHR integration: Add a POST endpoint that accepts de-identified patient records and generates patient-specific drug interaction alerts by cross-referencing the existing safety signal data.