This repository contains the work for my Major Technical Project-1 during 4th Year B.Tech, which explores the use of machine learning for the identification of potential biomarkers based on patient metabolite data. The goal is to aid in early disease diagnosis by extracting biologically significant features from large-scale sample datasets.
To develop a machine learning model that can classify metabolite data from patients and identify a small subset of biomarkers that are predictive of disease occurrence.
- Analyzed 100 patient samples (cases + controls) for metabolite profiling.
- Used MetaboAnalyst for:
- Data normalization
- Preprocessing
- PCA visualization
- Applied ML models:
- Support Vector Machine (SVM)
- XGBoost
- Random Forest
- Achieved 92% data similarity when validated against known datasets.
- Python (pandas, scikit-learn, xgboost, seaborn, matplotlib)
- MetaboAnalyst (Web-based metabolomic data analysis)
- Jupyter Notebook
- PDF report making for methodology and results
- Gained hands-on experience with biomarker discovery pipelines.
- Learned feature selection and model comparison in bioinformatics.
- Practiced end-to-end machine learning workflows with biomedical data.
- Validated findings using both biological context and model outputs.