Customer Churn Prediction — Lloyds Banking Group

A complete end-to-end machine learning project predicting customer churn using demographic, transactional, and behavioural data.

Project Overview

Item	Detail
Problem	Binary classification — predict if a customer will churn
Dataset	1,000 customers across 5 data sources
Best Model	Logistic Regression (Tuned)
Best Recall	46.3%
Best F1 Score	27.9%
Best AUC-ROC	52.9% (Random Forest)

Project Structure

churn_prediction_project/
├── data/
│   └── raw/                    ← original data (not uploaded)
├── notebooks/
│   ├── 01_EDA_Churn.ipynb      ← exploratory data analysis
│   └── 02_Preprocessing_Modelling.ipynb ← ML pipeline
├── reports/
│   ├── 01_churn_distribution.png
│   ├── 02_age_analysis.png
│   ├── 03_categorical_churn.png
│   ├── 04_transaction_analysis.png
│   ├── 05_online_activity_churn.png
│   ├── 06_service_churn.png
│   ├── 07_correlation_heatmap.png
│   ├── 08_model_comparison.png
│   └── 09_feature_importance.png
├── .gitignore
├── README.md
└── requirements.txt

Key Findings

EDA Findings

Dataset is 80/20 imbalanced — 796 stayed, 204 churned
LoginFrequency is the strongest predictor (correlation: -0.08)
TotalSpent and NumTransactions have 0.90 correlation — multicollinearity detected
332 customers never contacted customer service (structural zeros)
Categorical features (Gender, MaritalStatus) show weak churn signal

Model Results

Model	Accuracy	Precision	Recall	F1 Score	AUC-ROC
Logistic Regression	51.0%	20.0%	46.3%	27.9%	47.4%
Random Forest	63.0%	20.0%	26.8%	22.9%	52.9%
XGBoost	55.5%	14.7%	24.4%	18.3%	47.6%

Why Logistic Regression was selected

Highest Recall (46.3%) — catches most actual churners
No overfitting — Train-Test gap < 10%
Interpretable — required by FCA banking regulations
Random Forest and XGBoost showed severe overfitting before tuning

Top Churn Risk Factors

Rank	Feature	Direction	Business Meaning
1	LoginFrequency	↓ decreases churn	More logins = loyal customer
2	AvgSpent	↑ increases churn	Higher spenders still churn
3	DaysSinceLogin	↑ increases churn	Inactivity = churn signal
4	IncomeLevel_Low	↑ increases churn	Price-sensitive segment
5	MainInteractionType_Complaint	↑ increases churn	Complaints predict leaving

How to Run This Project

1. Clone the repository

git clone https://github.com/YOUR_USERNAME/churn_prediction_project.git
cd churn_prediction_project

2. Create virtual environment

python -m venv venv
venv\Scripts\activate        # Windows
source venv/bin/activate     # Mac/Linux

3. Install dependencies

pip install -r requirements.txt

4. Add your data

Place Customer_Churn_Data_Large.xlsx in data/raw/

5. Run notebooks in order

01_EDA_Churn.ipynb              → EDA and findings
02_Preprocessing_Modelling.ipynb → ML pipeline

Technologies Used

Tool	Purpose
Python 3.13	Core language
Pandas 3.0.1	Data manipulation
NumPy 2.4.3	Numerical computing
Matplotlib 3.10	Visualisation
Seaborn 0.13.2	Statistical plots
Scikit-learn 1.8.0	ML models and preprocessing
XGBoost 3.2.0	Gradient boosting
Jupyter Notebook	Development environment

Business Recommendations

Deploy Logistic Regression for monthly customer scoring
Three-tier risk segmentation — High/Medium/Low risk
Estimated impact — £14M+ annual revenue protected at scale
Key action — Re-engage customers with <15 logins/month
Data improvement — Add account tenure and product holdings

Author

Samitha Sandaruwan
Aspiring Data Scientist
GitHub

Acknowledgements

Project developed as part of Lloyds Banking Group Data Science Job Simulation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Prediction — Lloyds Banking Group

Project Overview

Project Structure

Key Findings

EDA Findings

Model Results

Why Logistic Regression was selected

Top Churn Risk Factors

How to Run This Project

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. Add your data

5. Run notebooks in order

Technologies Used

Business Recommendations

Author

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
notebooks		notebooks
reports		reports
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction — Lloyds Banking Group

Project Overview

Project Structure

Key Findings

EDA Findings

Model Results

Why Logistic Regression was selected

Top Churn Risk Factors

How to Run This Project

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. Add your data

5. Run notebooks in order

Technologies Used

Business Recommendations

Author

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages