End-to-end machine learning pipeline that predicts customer churn for a telecom provider, with SHAP explainability and an interactive Streamlit app.
Customer churn is one of the most costly problems in the telecom industry. This project builds a full ML pipeline that identifies customers at risk of churning and explains why using SHAP values.
The key insight: Logistic Regression outperformed XGBoost, Random Forest, and LightGBM on this dataset (ROC-AUC: 0.83), demonstrating that understanding your data matters more than reaching for the most complex model.
Adjust the customer profile in the sidebar to get a real-time churn probability, risk level, and a SHAP waterfall explanation for each prediction.
streamlit run app.py| Metric | Score |
|---|---|
| ROC-AUC | 0.83 |
| Average Precision | 0.64 |
| True Positives (Churn correctly identified) | 301 |
| False Negatives (Churn missed) | 73 |
Key findings from SHAP:
- Contract type is by far the strongest predictor — month-to-month customers churn at a much higher rate
- Tenure is the second most important feature — newer customers are significantly more likely to churn
- Monthly charges and lack of OnlineSecurity / TechSupport are the next biggest drivers
ml-churn-predictor/
├── app.py # Streamlit web app
├── run_pipeline.py # End-to-end training script
├── requirements.txt
├── notebooks/
│ ├── 01_eda.ipynb # Exploratory data analysis
│ └── 02_modelling.ipynb # Model comparison and evaluation
├── src/
│ ├── preprocess.py # Data cleaning and splitting
│ ├── features.py # Feature engineering pipeline
│ ├── train.py # Model training and CV
│ ├── evaluate.py # Metrics and plots
│ ├── explain.py # SHAP explainability
│ └── predict.py # Inference helpers
├── models/
│ └── plots/ # Saved evaluation plots
└── data/ # Not tracked — see below
1. Clone the repo
git clone https://github.com/ashbix23/ML-Churn-Predictor.git
cd ML-Churn-Predictor2. Install dependencies
pip install -r requirements.txt3. Download the dataset
Download the Telco Customer Churn dataset from Kaggle and place it at:
data/WA_Fn-UseC_-Telco-Customer-Churn.csv
4. Run the pipeline
python run_pipeline.pyThis will train all models, run cross-validation, save the best model, and generate all SHAP and evaluation plots.
5. Launch the app
streamlit run app.py- scikit-learn — preprocessing pipeline, Logistic Regression, Random Forest
- XGBoost / LightGBM — gradient boosting models
- SHAP — model explainability
- Streamlit — interactive web app
- pandas / numpy / matplotlib / seaborn — data processing and visualization
The dataset is not included in this repo. Download it from Kaggle: https://www.kaggle.com/datasets/blastchar/telco-customer-churn


