BioTarget: End-to-End AI Drug Discovery Pipeline 🧬💊

BioTarget is a open-source CLI pipeline designed to accelerate the early stages of the AI drug-discovery workflow. It seamlessly links target discovery, 3D protein structure prediction, deep-learning-based contrastive molecular screening, and physics-based CNN docking into a single cohesive framework. The pipeline leverages DrugCLIP (a dual-encoder graph-text architecture) to act as a generative filter for toxicity and therapeutic intent, and gnina for structure-aware binding affinity predictions.

▶️ Watch the demo on YouTube

🚀 Quick install

pip install biotarget

NOTE: Install GNINA is strongly recommended (if you have a nvidia GPU)

🔬 Ready for running

biotarget run full \
  --disease "Alzheimer" \
  --target-model hetero-gnn \
  --structure-engine openfold3 \
  --binding-engine gnina \
  --top-ligands 10

Installation for customization

BioTarget requires Python 3.9+ and PyTorch.

0. Install GNINA (Docker-based dependency)

Before installing Python dependencies, set up the GNINA Docker environment:

chmod +x scripts/install_gnina_docker.sh
./scripts/install_gnina_docker.sh

Requirements:

Docker installed and running
NVIDIA GPU recommended
nvidia-container-toolkit for GPU acceleration

1. Base Installation

git clone https://github.com/homerquan/biotarget.git
cd biotarget

python3 -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

2. Install DrugCLIP

pip install git+https://github.com/homerquan/drugclip.git

3. Protein Structure Sources

Default: AlphaFold Protein Structure Database
Optional: OpenFold weights placed in:

~/.biotarget/openfold3_weights/

🧩 Pipeline Architecture

Stage A: Disease → Target Ranking

Data: Open Targets, DisGeNET, STRING, Reactome
Method: heterogeneous graph neural networks

Stage B: Protein Structure Generation

PDB structures when available
OpenFold predictions otherwise

Stage C: Candidate Generation

Text to candidate drug molecules(a graph). We are using a trained model DrugCLIP (Project page) [https://github.com/homerquan/DrugCLIP]
Produces filtered candidate set

Stage D: Binding and Toxicity Evaluation

GNINA docking (CNN-based scoring)
Embedding-based toxicity proxy

Stage E: Ranking

Final score:

S_final = S_binding - 0.5 * S_tox

Outputs ranked candidate molecules for downstream simulation. Note: the binding score is rough estimate, only useful to filter out bad candidates.

🔬 Running the BioTarget Pipeline

The pipeline is invoked via the unified biotarget/cli.py orchestrator (or via the biotarget command if installed globally).

To execute the end-to-end pipeline for a specific disease:

python biotarget/cli.py run full \
  --disease "Alzheimer" \
  --target-model hetero-gnn \
  --structure-engine openfold3 \
  --binding-engine gnina \
  --top-targets 3 \
  --top-ligands 10

Example Output

[Stage A] Disease -> Protein Target Ranking
[*] Querying Open Targets & DisGeNET for 'Alzheimer'...
[*] Found 3 highly ranked targets.

[Stage B] Protein Structure Generation
[*] Using engine: openfold3
[*] Folding GBA (P04062) with OpenFold-3...

[Stage C] Generative AI: De Novo Candidate Generation
[*] Generating 3000 de novo molecular structures...
[*] Generating 3D conformers for the generative pool using 64 CPU cores...
[*] Using DrugCLIP to guide selection of the top 100 generated candidates...
[*] Successfully finalized 10x generative candidate pool (N=100).

[Stage D] Binding Evaluation (gnina) & Toxicity Filtering (DrugCLIP)
[*] Loaded Target Receptor: GBA from Stage B (/runs/structures/GBA_openfold3.pdb)
[*] Computing Toxicity penalties for 100 candidates via DrugCLIP...
[*] Executing 'gnina' structure-aware docking & CNN scoring on 100 candidates...

[Stage E] Reporting
=====================================================================================
BIOTARGET PIPELINE FINAL RESULTS FOR: 'Alzheimer'
=====================================================================================
Rank  | Final  | Gnina (pK_d) | Tox Penalty   | SMILES
-------------------------------------------------------------------------------------
#1    | 0.9944 | 9.4457 (0.99) | 0.0000 OK      | CCC1(C(C)(C)C)CCOC1=O...
#2    | 0.8108 | 8.9903 (0.91) | 0.2005 OK      | COc1ccccc1N=C(S)N(CCN1CCOCC1)Cc1ccc...
#3    | 0.7631 | 9.2345 (0.96) | 0.3852 OK      | CCOC(=O)C1CCCN(c2c(NCCCN(C)Cc3ccccc...
#4    | 0.5101 | 8.8713 (0.87) | 0.7225 ⚠️ HIGH | CCCC(N=C(S)NCC1CCCO1)C12CC3CC(CC(C3...

🛠 Model Extensibility (Roadmap)

While this framework establishes an AI-driven core, it is intentionally modular to support integration with downstream biophysics tools:

Generative Expansion: Improve the generated candidate subset with an active autoregressive or diffusion-based generative model—or an evolutionary algorithm—to produce more diverse candidates and enable closed-loop optimization.
Enhanced Simulation: Evaluate the final shortlist of candidates using high-fidelity simulations to more accurately predict effectiveness and potential side effects e.g., Molecular dynamics (MD) https://en.wikipedia.org/wiki/Molecular_dynamics, https://github.com/NVIDIA/nvalchemi-toolkit

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
biotarget		biotarget
scripts		scripts
.gitignore		.gitignore
README.md		README.md
SPEC.md		SPEC.md
requirements.txt		requirements.txt
setup.py		setup.py
test_gnina.py		test_gnina.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioTarget: End-to-End AI Drug Discovery Pipeline 🧬💊

🚀 Quick install

🔬 Ready for running

Installation for customization

0. Install GNINA (Docker-based dependency)

1. Base Installation

2. Install DrugCLIP

3. Protein Structure Sources

🧩 Pipeline Architecture

Stage A: Disease → Target Ranking

Stage B: Protein Structure Generation

Stage C: Candidate Generation

Stage D: Binding and Toxicity Evaluation

Stage E: Ranking

🔬 Running the BioTarget Pipeline

Example Output

🛠 Model Extensibility (Roadmap)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BioTarget: End-to-End AI Drug Discovery Pipeline 🧬💊

🚀 Quick install

🔬 Ready for running

Installation for customization

0. Install GNINA (Docker-based dependency)

1. Base Installation

2. Install DrugCLIP

3. Protein Structure Sources

🧩 Pipeline Architecture

Stage A: Disease → Target Ranking

Stage B: Protein Structure Generation

Stage C: Candidate Generation

Stage D: Binding and Toxicity Evaluation

Stage E: Ranking

🔬 Running the BioTarget Pipeline

Example Output

🛠 Model Extensibility (Roadmap)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages