GlucoTrack: Predicting Diabetes Likelihood from Lifestyle and Clinical Data

Deployment

Run the Streamlit app locally to interact with the champion model (Balanced FFNN) and get diabetes risk predictions.

Prerequisites:

Python 3.9+
Dependencies: streamlit, torch, pandas, numpy, scikit-learn, joblib

Quick start:

# Create and activate a virtual environment (optional but recommended)
python -m venv .venv
source .venv/bin/activate  # Linux/macOS

# Install dependencies
pip install --upgrade pip
pip install streamlit torch pandas numpy scikit-learn joblib

# Launch the app
cd yan-cotta
streamlit run app.py

Notes:

The app loads artifacts from yan-cotta/deployment_artifacts (model weights and preprocessing objects) generated from the Week 2 and Week 3 notebooks.
If you update preprocessing or retrain the model, re-run the last cells in those notebooks to refresh the files in deployment_artifacts before restarting the app.

Deploy to Streamlit Cloud

Ensure these files exist at the repository root:
- requirements.txt (Python dependencies)
- runtime.txt (Python version; not always honored on Cloud which may default to Python 3.13)
In Streamlit Cloud, set the entrypoint to yan-cotta/app.py.
PyTorch note: Cloud currently uses Python 3.13, so torch==2.8.0 is pinned in requirements.txt because it provides wheels for 3.13.
After deployment, use “Manage app” → Logs to inspect any missing dependency errors.

🧠 Project Overview

-> shared repository with all team members work: https://github.com/SuperDataScience-Community-Projects/SDS-CP035-gluco-track

GlucoTrack is a health-focused classification project that aims to predict whether an individual is diabetic, pre-diabetic, or non-diabetic using self-reported survey and biometric data. The dataset includes responses collected by the CDC on physical activity, BMI, mental health, general wellness, and other lifestyle variables.

The project is built around a real-world public health challenge: how do we identify individuals at risk before it’s too late? You’ll analyze patterns in health behavior and demographic variables to develop a predictive model that supports early interventions.

This project is split into two experience tracks:

🟢 Beginner Track – Build traditional classification models using scikit-learn and deploy a simple Streamlit app.
🔴 Advanced Track – Design and train a deep learning classifier with PyTorch or TensorFlow, integrate embeddings for categorical data, and deploy via Docker or Hugging Face Spaces.

🧪 Dataset Summary

Source: CDC Behavioral Risk Factor Surveillance System (BRFSS)
Instances: ~250,000
Target Variable: Diabetes_binary (0 = No, 1 = Pre-diabetes or Diabetes)
Features: 20+ features covering BMI, physical activity, mental health days, general health, smoking status, and more
Data Type: Tabular, mostly binary and ordinal features
Task Type: Classification

🧰 Tools & Libraries

ML Libraries: scikit-learn, XGBoost, LightGBM (Beginner); PyTorch or TensorFlow (Advanced)
Visualization: matplotlib, seaborn
Deployment: Streamlit (Beginner), Docker + Hugging Face Spaces or Flask API (Advanced)
Experiment Tracking: MLflow

📂 Track Structure

🟢 Beginner Track

➡️ Beginner Scope of Works ➡️ Beginner Report Template ➡️ Submit Your Work

🔴 Advanced Track

➡️ Advanced Scope of Works ➡️ Advanced Report Template ➡️ Submit Your Work

🚀 Learning Outcomes

By the end of this project, you will:

Understand health-related feature engineering and data cleaning techniques
Apply classification models and evaluate their performance with appropriate metrics

Track experiments and manage reproducibility using MLflow
Deploy interactive predictive tools for real-world use cases

👥 Who Should Join?

This project is ideal for:

Aspiring machine learning engineers and data scientists interested in healthcare analytics
Beginners seeking to master the end-to-end ML workflow

Intermediate learners who want to experiment with neural networks and sequence modeling

📝 Acknowledgements

This project uses open public health data provided by the Centers for Disease Control and Prevention (CDC).

Name		Name	Last commit message	Last commit date
Latest commit History 192 Commits
yan-cotta		yan-cotta
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GlucoTrack: Predicting Diabetes Likelihood from Lifestyle and Clinical Data

Deployment

Deploy to Streamlit Cloud

🧠 Project Overview

🧪 Dataset Summary

🧰 Tools & Libraries

📂 Track Structure

🟢 Beginner Track

🔴 Advanced Track

🚀 Learning Outcomes

👥 Who Should Join?

📝 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GlucoTrack: Predicting Diabetes Likelihood from Lifestyle and Clinical Data

Deployment

Deploy to Streamlit Cloud

🧠 Project Overview

🧪 Dataset Summary

🧰 Tools & Libraries

📂 Track Structure

🟢 Beginner Track

🔴 Advanced Track

🚀 Learning Outcomes

👥 Who Should Join?

📝 Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages