PillChecker API

PillChecker helps users find out if two medications are safe to take at the same time. This repository contains the backend API that identifies drugs from OCR text and checks for dangerous interactions using DrugBank pharmaceutical data.

MEDICAL DISCLAIMER

This service is provided for informational and self-educational purposes only. While the application utilizes data from respected pharmaceutical sources, the information provided should not be treated as medical advice, diagnosis, or treatment.

The developer of this project does not have any medical qualifications. This tool was built as a technical exercise to explore NLP and medical data integration.

Always consult with a qualified healthcare professional (such as a doctor or pharmacist) before making any decisions regarding your medications or health. The developer assumes no responsibility or liability for any errors, omissions, or consequences arising from the use of the information provided by this service.

Architecture

Drug Identification

Converts unstructured OCR text into standardized drug records using a multi-step strategy:

OCR Cleaning: The ocr_cleaner normalizes common OCR artifacts before NER: digit-letter confusion (0/o, 1/l), rn→m in drug names, ligatures, invisible characters, and whitespace.
NER: The OpenMed-NER-PharmaDetect-BioPatient-108M model (108M parameters) extracts chemical entity names from the cleaned text.
Fallback: If NER yields no results, an approximate term search via the RxNorm REST API catches brand names (e.g., "Advil" -> ibuprofen).
Enrichment: A regex parser extracts dosages (e.g., "400 mg"), and the RxNorm API maps every identified drug to its RxCUI for standardized downstream lookups.
Confidence: Results with NER score below 0.85 or sourced from the RxNorm fallback are flagged with needs_confirmation = true to prompt user verification.

Interaction Checking

Drug-drug interactions are resolved against the DrugBank pharmaceutical database via a vendored MCP server:

DrugBank MCP server: A Node.js process (vendored under drugbank-mcp-server/) communicates over stdio using the Model Context Protocol. It serves a pre-built SQLite database (~17,400 drugs) with structured pairwise interaction data.
Bidirectional lookup: For each drug pair, the checker queries both directions (A->B and B->A) in parallel using asyncio.gather().
Severity classification: Interaction descriptions are first parsed by a deterministic template parser that matches regex patterns in DrugBank text. If the parser cannot determine severity, a DeBERTa v3 zero-shot classifier is used as fallback. Unknown severity defaults to major with uncertain = true.
Caching: DrugBank interaction records are cached in-process for 4 hours; RxNorm lookups are cached for 24 hours.

Transparency

Both /analyze and /interactions responses include:

data_sources: which models and databases were used for the result
limitations (interactions only): scope disclaimers about what the system does and does not cover

Docker Build

The image uses a three-stage build to keep layers small and reproducible:

Stage 1 (Python): uv installs Python dependencies into an isolated venv.
Stage 2 (Node.js): npm ci installs Node dependencies; the DrugBank SQLite database is downloaded from GitHub Releases.
Stage 3 (Runtime): Combines the venv, Node binary, and built MCP server. NER and severity models are pre-downloaded so the image is fully self-contained.

API Endpoints

Method	Path	Auth	Description
`GET`	`/health`	No	Liveness check
`GET`	`/health/data`	No	Readiness -- confirms DrugBank MCP connection
`POST`	`/analyze`	API key	Extract drugs from OCR text
`POST`	`/interactions`	API key	Check interactions for a list of drug names
`POST`	`/admin/cache/clear`	API key	Clear all in-memory caches

Eval Benchmark

The eval/ directory contains a benchmark suite that measures NER accuracy on synthesized pharmaceutical pack-label text. See eval/BENCHMARK.md for methodology and results.

Dataset: 11,796 cases generated from the MattBastar/Medicine_Details HuggingFace dataset, with configurable OCR noise levels (clean, light, heavy).

Pipeline / Noise Level	Precision	Recall	F1
Bare NER (Clean)	46.9%	84.4%	60.3%
Bare NER (Light Noise)	44.9%	79.8%	57.5%
Bare NER (Heavy Noise)	26.2%	53.5%	35.2%
Full Pipeline (Clean)	71.6%	81.0%	76.0%
Full Pipeline (Light Noise)	74.4%	79.8%	77.0%
Full Pipeline (Heavy Noise)	65.6%	47.6%	55.2%

uv run python eval/prepare_hf_dataset.py           # generate dataset
uv run python eval/benchmark.py --limit 500         # run benchmark

Acknowledgments

OpenMed NER PharmaDetect -- drug entity recognition model (108M params). License: Apache 2.0
RxNorm REST API -- drug name normalization and RxCUI mapping. Provided by NLM (free to use).
DrugBank -- pharmaceutical database providing structured drug-drug interaction data. Accessed via the openpharma-org/drugbank-mcp-server open-source MCP server.
DeBERTa-v3-base-mnli-fever-anli -- zero-shot classifier for interaction severity. License: MIT
Hugging Face Transformers -- NLP pipeline library. License: Apache 2.0
MattBastar/Medicine_Details -- benchmark dataset (11.8K medicines with compositions).

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
.github/workflows		.github/workflows
app		app
docs		docs
drugbank-mcp-server		drugbank-mcp-server
eval		eval
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.ci.yml		docker-compose.ci.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PillChecker API

Architecture

Drug Identification

Interaction Checking

Transparency

Docker Build

API Endpoints

Eval Benchmark

Acknowledgments

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PillChecker API

Architecture

Drug Identification

Interaction Checking

Transparency

Docker Build

API Endpoints

Eval Benchmark

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages