A machine learning system for detecting fraudulent credit card transactions using ensemble methods and deployed as an interactive web application.
π Live Application: https://credit-card-fraud-detection-hrczoamwj8xufv5umabmg6.streamlit.app/
- Overview
- Features
- Project Structure
- Technologies Used
- Performance Metrics
- Quick Start
- Usage
- Deployment
- Contributing
- License
- Author
This project implements a comprehensive fraud detection system capable of identifying fraudulent credit card transactions in real-time. The system handles highly imbalanced datasets (fraud rate ~0.17%) using advanced machine learning techniques including SMOTE oversampling, ensemble methods, and optimized hyperparameter tuning.
- Robust ML Pipeline: Preprocessing, feature engineering, model training, and evaluation
- Interactive Web App: User-friendly Streamlit interface for single and batch predictions
- Production-Ready: Deployed on Streamlit Cloud with automatic dependency management
- Comprehensive Analysis: Jupyter notebooks for exploratory data analysis and model interpretation
- π Automated Preprocessing: Robust scaling and data splitting with stratified sampling
- π€ Multiple ML Models: Logistic Regression, Random Forest, and XGBoost with cross-validation
- βοΈ Imbalance Handling: SMOTE oversampling and class-weighted models
- π Interactive Dashboard: Real-time predictions with adjustable decision thresholds
- π Performance Visualization: ROC curves, Precision-Recall curves, and confusion matrices
- π Model Interpretability: SHAP integration for feature importance analysis
- πΎ Batch Processing: Upload CSV files for bulk transaction analysis
- π₯ Export Results: Download predictions with fraud probabilities and labels
credit-card-fraud-detection/
βββ data/
β βββ raw/ # Raw dataset (creditcard.csv)
β βββ processed/ # Processed splits (X_train, X_test, y_train, y_test)
βββ notebooks/
β βββ 01_EDA.ipynb # Exploratory Data Analysis
β βββ 02_modeling.ipynb # Model training and evaluation
βββ src/
β βββ preprocess.py # Data preprocessing and scaling
β βββ train.py # Model training pipeline
β βββ evaluate_model.py # Model evaluation and visualization
β βββ predict.py # Prediction functions (single & batch)
βββ app/
β βββ app.py # Streamlit web application
βββ models/
β βββ fraud_model.joblib # Trained XGBoost model
β βββ scaler.joblib # Fitted RobustScaler
β βββ metrics.txt # Performance metrics
β βββ plots/ # Evaluation plots (ROC, PR curves)
βββ scripts/
β βββ generate_synthetic.py # Synthetic data generator
βββ requirements.txt # Python dependencies
βββ LICENSE # MIT License
βββ README.md # This file
- Machine Learning: scikit-learn, XGBoost, imbalanced-learn
- Data Processing: pandas, numpy
- Visualization: matplotlib, seaborn, SHAP
- Web Framework: Streamlit
- Model Persistence: joblib
- Development: Jupyter Lab
The best-performing model (XGBoost) achieved:
| Metric | Score |
|---|---|
| ROC-AUC | 0.9784 |
| PR-AUC | 0.6970 |
Note: For highly imbalanced problems, PR-AUC (Precision-Recall Area Under Curve) is often more informative than ROC-AUC as it focuses on the precision/recall trade-offs for the rare positive class.
| Model | CV ROC-AUC | CV PR-AUC |
|---|---|---|
| Logistic Regression | 0.9907 | 0.4677 |
| Random Forest | 0.9498 | 0.1879 |
| XGBoost | 0.9610 | 0.5912 |
- Python 3.8 or higher
- pip package manager
-
Clone the repository
git clone https://github.com/Nedim7050/credit-card-fraud-detection.git cd credit-card-fraud-detection -
Create a virtual environment
python -m venv .venv # Windows .venv\Scripts\activate # Linux/Mac source .venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Prepare the dataset
Option A: Use Kaggle dataset
- Download
creditcard.csvfrom Kaggle - Place it in
data/raw/creditcard.csv
Option B: Generate synthetic data
python scripts/generate_synthetic.py --rows 100000 --fraud-rate 0.0017
- Download
-
Preprocess and train the model
python src/preprocess.py --save-scaler --test-size 0.2 python src/train.py --use-smote
-
Evaluate the model (optional)
python src/evaluate_model.py
-
Run the Streamlit app
streamlit run app/app.py
-
Single Transaction Prediction
- Enter transaction details (Amount, Time, optional V1-V5 features)
- Adjust the decision threshold slider
- Click "Predict Single" or "Try Demo Single" for a quick test
-
Batch Prediction
- Upload a CSV file with transaction data
- Or use "Generate mini sample" for a quick demo
- View top 10 highest-risk transactions
- Download results as CSV
from src.predict import predict_single, predict_batch
import joblib
# Load model and scaler
model = joblib.load('models/fraud_model.joblib')
scaler = joblib.load('models/scaler.joblib')
# Single prediction
result = predict_single(
model, scaler,
data_dict={'Amount': 250.0, 'Time': 12345.0, 'V1': 2.0},
threshold=0.5
)
print(f"Fraud probability: {result.probability:.4f}")
print(f"Predicted label: {result.label}")
# Batch prediction
import pandas as pd
df = pd.read_csv('your_transactions.csv')
results = predict_batch(model, scaler, df, threshold=0.5)01_EDA.ipynb: Explore data distributions, correlations, and class imbalance02_modeling.ipynb: Train models, perform hyperparameter tuning, and generate SHAP plots
β Already Deployed! Access the live app: https://credit-card-fraud-detection-hrczoamwj8xufv5umabmg6.streamlit.app/
To deploy your own instance:
- Fork this repository
- Go to streamlit.io/cloud
- Connect your GitHub account
- Select your repository
- Set main file path:
app/app.py - Click "Deploy"
Streamlit Cloud will automatically install dependencies from requirements.txt.
streamlit run app/app.pyThe app will be available at http://localhost:8501
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Nedim Mejri
- GitHub: @Nedim7050
- Project Link: https://github.com/Nedim7050/credit-card-fraud-detection
- Dataset inspiration from Kaggle Credit Card Fraud Detection
- Streamlit for the web framework
- The open-source ML community
β If you find this project helpful, please consider giving it a star!