GitHub - jeffjjohnston/genomics-rare-disease-ai-workflow: An experimental AI agent workflow.

Overview

This repo is an experimental agent-based workflow designed to search for disease-causing variants in a patient's genomic sequencing results based on a description of the patient's symptoms.

See my blog posts for additional discussion:

Getting started

Here's what you'll need to run this workflow:

An OpenAI API key
Annotated variants from Illumina Nirvana
An hpo.obo file downloaded from the HPO website
A phenotype_to_genes.txt file downloaded from the HPO website

Set up the Python environment

git clone https://github.com/jeffjjohnston/genomics-rare-disease-ai-workflow.git
cd genomics-rare-disease-ai-workflow
uv venv -p python3.12
source .venv/bin/activate
uv pip install -r requirements.txt
echo OPENAI_API_KEY=YOUR_API_KEY > .env

Generate required resources

Build the HPO terms vector database from the downloaded hpo.obo file:

mkdir -p resources/hpo_agent
python generate-hpo-index.py \
    --model cambridgeltl/SapBERT-from-PubMedBERT-fulltext \
    --obo_file hp.obo \
    --index_base resources/hpo_agent/SapBERT-PubMedBERT_hpo

Create a new DuckDB database from the Nirvana JSON:

mkdir patients
python add-variants.py \
    --json /path/to/variants.json.gz \
    --db patients/variants.duckdb

For instructions on running Nirvana on a VCF, see this guide

Run the workflow

First, describe your patient's symptoms in a plain text file (for example, patient_symptoms.txt).

Run the workflow:

python run-workflow.py \
    --symptoms patient_symptoms.txt \
    --hpo-db resources/hpo_agent/SapBERT-PubMedBERT_hpo.json.gz \
    --phenotypes-to-gene-file phenotype_to_genes.txt \
    --variant-db patients/variants.duckdb \
    --output results.txt

Example data

If you want to run the two example scenarios from the blog posts linked above, I've made the Colombian trio data available as both the VCF from the International Genome Sample Resource site as well as the Nirvana-produced JSON. The pathogenic variants added to the variant database for the two scenarios can be found in the examples/ directory in this repo.

Colombian trio VCF: https://downloads.newmatter.net/genomics-rare-disease-ai-workflow/colombian_trio.exome.vcf.gz
Colombian trio Nirvana JSON: https://downloads.newmatter.net/genomics-rare-disease-ai-workflow/colombian_trio.exome.json.gz

To run the first scenario:

mkdir patients
wget https://downloads.newmatter.net/genomics-rare-disease-ai-workflow/colombian_trio.exome.json.gz

# Create a new database with the full exome variants
python add-variants.py \
    --json colombian_trio.exome.json.gz \
    --db patients/colombian_trio.duckdb

# Inject the single annotated pathogenic variant
python add-variants.py \
    --json examples/clinvar_143754.json.gz \
    --db patients/colombian_trio.duckdb

# Run the workflow
python run-workflow.py \
    --symptoms examples/example_case_1.md \
    --hpo-db resources/hpo_agent/SapBERT-PubMedBERT_hpo.json.gz \
    --phenotypes-to-gene-file phenotype_to_genes.txt \
    --variant-db patients/colombian_trio.duckdb \
    --output example_case_1_results.txt

For the second scenario, add the examples/BCKDHA_variant.json.gz variant and use the examples/example_case_2.md symptom description file.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
prompts		prompts
workflow_agents		workflow_agents
.gitignore		.gitignore
LICENSE		LICENSE
Nirvana_guide.md		Nirvana_guide.md
README.md		README.md
add-variants.py		add-variants.py
generate-hpo-index.py		generate-hpo-index.py
requirements.txt		requirements.txt
run-workflow.py		run-workflow.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Getting started

Set up the Python environment

Generate required resources

Run the workflow

Example data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Getting started

Set up the Python environment

Generate required resources

Run the workflow

Example data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages