Skip to content

Rishu-raj-02/AMR-Multi-Antibiotic-Resistance-Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 AMR Multi-Antibiotic Resistance Predictor

AI-powered genomic resistance prediction for E. coli β€” Ciprofloxacin, Ceftriaxone & Amoxicillin

Python Flask scikit-learn License: MIT


πŸ”¬ What This Does

This project is a multi-task machine learning system that predicts antibiotic resistance in E. coli genomes. Given 40 genomic features extracted from whole-genome sequencing data (gene presence/absence markers, mutation counts, and proportion scores), the model simultaneously predicts resistance outcomes for three clinically important antibiotics:

Antibiotic Drug Family Treats
Ciprofloxacin Fluoroquinolone UTIs, respiratory infections
Ceftriaxone 3rd-gen Cephalosporin Pneumonia, meningitis, hospital infections
Amoxicillin Penicillin Ear infections, strep throat, chest infections

Each prediction returns one of three standardised AST labels:

  • 🟒 S β€” Susceptible (antibiotic kills the bacteria)
  • 🟑 I β€” Intermediate (uncertain, dose/context dependent)
  • πŸ”΄ R β€” Resistant (antibiotic fails, bacteria survives)

πŸ“Š Model Performance

Antibiotic Accuracy MCC F1 (weighted) AUC
Ciprofloxacin 78.3% 0.553 0.717 0.934
Ceftriaxone 83.5% 0.632 0.783 0.935
Amoxicillin 79.8% 0.572 0.732 0.934

Algorithm: MultiOutputClassifier wrapping RandomForestClassifier (200 trees, max depth 12)


πŸ—„οΈ Dataset & Data Pipeline

Source

Genomic data and Antimicrobial Susceptibility Testing (AST) results were sourced from the PATRIC / BV-BRC public database β€” the largest freely available bacterial genomics repository maintained by the U.S. Department of Energy.

  • Organism: Escherichia coli (Gram-negative, clinically relevant)
  • Genomes collected: 3,000 whole-genome sequencing (WGS) records
  • AST labels: R / I / S for Ciprofloxacin, Ceftriaxone, and Amoxicillin

Data Cleaning Steps

Raw data from PATRIC required significant cleaning before it was usable:

  1. Duplicate removal β€” genomes with identical PATRIC IDs or redundant AST entries were dropped
  2. Missing label handling β€” genomes missing AST results for any of the 3 antibiotics were excluded
  3. Outlier filtering β€” genomes with abnormal feature distributions (>3 SD from mean) were flagged and reviewed
  4. Label standardisation β€” raw MIC values and non-standard labels were mapped to S / I / R using EUCAST 2023 breakpoints
  5. Class balance check β€” final label distribution confirmed: S β‰ˆ 65%, I β‰ˆ 15%, R β‰ˆ 20%

Feature Engineering

Features were extracted from cleaned genome assemblies using two bioinformatics tools:

a) AMRFinderPlus β€” identifies resistance genes:

amrfinderplus -i genome.fasta -o amr_genes.tsv
Feature Group Columns What Was Extracted
F1–F15 Binary (0/1) Presence/absence of resistance genes & key mutations (e.g., gyrA S83L, bla_TEM, bla_CTX-M-15)
F16–F30 Integer (0–10) Counts of resistance elements, mobile genetic elements, efflux pump genes
F31–F40 Real (0.00–1.00) Proportional scores β€” fraction of genome with resistance markers, intact target sites

b) bcftools β€” identifies point mutations:

bcftools mpileup -f reference.fasta genome.bam | bcftools call -mv -o variants.vcf

Mutations in gyrA, parC, ompF, and beta-lactamase regions were recorded as binary features.


πŸš€ Quick Start

1. Clone the repo

git clone https://github.com/Rishu-raj-02/AMR-Multi-Antibiotic-Resistance-Predictor.git
cd AMR-Multi-Antibiotic-Resistance-Predictor

2. Install dependencies

pip install -r requirements.txt

3. Train the model

python train_model.py

Reads data/amr_dataset.csv, trains the model, saves artifacts to models/.

4. Start the web app

python app.py

Open http://localhost:5000 in your browser.


πŸ—‚οΈ Project Structure

AMR-Multi-Antibiotic-Resistance-Predictor/
β”‚
β”œβ”€β”€ data/
β”‚   └── amr_dataset.csv          # Cleaned E. coli dataset (3,000 genomes Γ— 44 cols)
β”‚
β”œβ”€β”€ models/                      # Auto-generated by train_model.py
β”‚   β”œβ”€β”€ amr_model.pkl            # Trained MultiOutputClassifier
β”‚   β”œβ”€β”€ encoders.pkl             # LabelEncoders for S/I/R
β”‚   β”œβ”€β”€ metrics.json             # Per-antibiotic performance stats
β”‚   β”œβ”€β”€ feature_importance.json  # Feature ranking per antibiotic
β”‚   └── feature_cols.json        # Ordered feature column list
β”‚
β”œβ”€β”€ templates/
β”‚   └── index.html               # Dark biotech dashboard UI
β”‚
β”œβ”€β”€ app.py                       # Flask web server + REST API
β”œβ”€β”€ train_model.py               # Model training & evaluation script
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ render.yaml                  # One-click Render.com deployment
β”œβ”€β”€ Procfile                     # Gunicorn process config
└── README.md

🌐 API Endpoints

Method Endpoint Description
GET / Interactive web dashboard
POST /predict Run resistance prediction
GET /metrics Model performance stats
GET /random_sample Load a sample genome for demo

Example /predict request:

POST /predict
{
  "features": {
    "F1": 1, "F2": 0, "F3": 1,
    "F4": 1, "F11": 1,
    "F16": 7, "F17": 3,
    "F31": 0.85, "F32": 0.20, "F33": 0.70
  }
}

Example response:

{
  "predictions": {
    "CIPRO": {
      "label": "R",
      "full": "Resistant",
      "confidence": 87.3,
      "probabilities": { "S": 0.07, "I": 0.06, "R": 0.87 }
    },
    "CEFTRIAXONE": { "label": "S", "confidence": 72.1 },
    "AMOXICILLIN": { "label": "R", "confidence": 81.5 }
  }
}

🧬 Feature Reference

F1–F15 Β· Binary (0 = absent, 1 = present)

Gene presence / mutation markers extracted via AMRFinderPlus:

Feature Biological Meaning Linked Antibiotic
F1 gyrA S83L mutation Ciprofloxacin ↑R
F2 gyrA D87G mutation Ciprofloxacin ↑R
F3 bla_TEM gene presence Amoxicillin ↑R
F4 Multi-drug resistance plasmid marker All three ↑R
F7 bla_SHV gene Ceftriaxone ↑R
F11 bla_CTX-M-15 (ESBL) gene Ceftriaxone ↑R

F16–F30 Β· Integer (0–10)

Count features β€” number of resistance mutations, mobile elements, efflux pump genes detected per genome assembly.

F31–F40 Β· Real (0.00–1.00)

Proportion scores β€” fraction of genome containing resistance markers, proportion of intact drug-binding sites, etc.


πŸ₯ Top Resistance Drivers (Feature Importance)

Rank Feature Importance Biological Role
1 F3 0.1018 bla_TEM β€” Amoxicillin resistance
2 F2 0.0719 gyrA mutation β€” Cipro resistance
3 F4 0.0570 Multi-drug plasmid marker
4 F7 0.0553 bla_SHV β€” Ceftriaxone resistance
5 F1 0.0536 gyrA S83L β€” Cipro resistance

πŸ₯ Clinical Decision Support

The web app generates real-time clinical guidance:

  • Flags Resistant predictions with evidence-based alternative drug suggestions
  • Recommends confirmatory MIC lab testing for borderline (I) results
  • Displays probability distributions across all three resistance classes

⚠️ Disclaimer: This tool is intended for research and educational purposes. Clinical treatment decisions must always be confirmed with certified laboratory antimicrobial susceptibility testing.


πŸ“– How the Model Was Built

  1. Data Collection β€” 3,000 E. coli WGS records with AST results from PATRIC/BV-BRC
  2. Cleaning β€” duplicate removal, missing label exclusion, EUCAST breakpoint standardisation
  3. Feature Extraction β€” AMRFinderPlus (resistance genes) + bcftools (point mutations)
  4. Feature Engineering β€” 40 columns: 15 binary + 15 integer + 10 real-valued proportions
  5. Modelling β€” MultiOutputClassifier (RandomForest, 200 estimators, depth 12)
  6. Evaluation β€” 80/20 stratified split; Accuracy, MCC, F1-weighted, AUC-ROC (OvR)

🀝 Contributing

Pull requests welcome. For major changes, please open an issue first.


πŸ“„ License

MIT β€” see LICENSE


Built for IIT Mandi Hackathon β€” E. coli Β· Ciprofloxacin Β· Ceftriaxone Β· Amoxicillin

About

🧬 AI-powered multi-task model predicting E. coli resistance to Ciprofloxacin, Ceftriaxone & Amoxicillin from genomic features | AUC ~0.934 | Flask + RandomForest

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors