This project implements an enterprise-grade credit risk assessment system that predicts loan default probability using advanced machine learning techniques, comprehensive statistical analysis, and sophisticated feature engineering methodologies.
- Traditional credit scoring systems lack flexibility and fail to capture complex patterns
- Manual feature engineering is time-consuming and may miss important relationships
- Hyperparameter tuning requires extensive expertise and computational resources
- Model interpretability is crucial for regulatory compliance and business decisions
A state-of-the-art ML pipeline featuring:
- Comprehensive EDA: Statistical tests, distribution analysis, and correlation studies
- Advanced Feature Engineering: WoE (Weight of Evidence) and IV (Information Value) transformation
- Multiple Optimization Strategies: Optuna, GridSearchCV, and RandomizedSearchCV
- Statistical Validation: KS (Kolmogorov-Smirnov) statistic for model performance
- Multiple Encoding Techniques: One-hot, label, target, frequency, and WoE encoding
- Production Deployment: Interactive Streamlit dashboard for real-time predictions
|
|
|
|
|
|
The project integrates three interconnected datasets for comprehensive credit risk analysis:
| Dataset | Description |
|---|---|
| bureau_data.csv | Credit bureau data with payment history, credit utilization, and delinquency records |
| customers.csv | Customer demographics including income, employment, and personal details |
| loans.csv | Loan details with amount, term, interest rate, and default status (target variable) |
📋 Bureau Data Features
Credit History:
- Credit account age
- Number of active/closed accounts
- Credit mix (mortgage, auto, personal loans)
Payment Behavior:
- Payment history (on-time vs late)
- Number of delinquencies (30, 60, 90+ days)
- Derogatory marks
Credit Utilization:
- Total credit limit
- Total outstanding balance
- Utilization ratio
Inquiries:
- Hard inquiry count (last 6 months)
- Soft inquiry count
👤 Customer Data Features
Demographics:
- Age
- Gender
- Marital status
- Number of dependents
Employment:
- Employment status
- Employment length
- Job title/category
- Income stability
Financial:
- Annual income
- Monthly income
- Income source
Geographic:
- State/City
- ZIP code
- Urban/Rural classification
💰 Loan Data Features
Loan Details:
- Loan amount requested
- Loan term (months)
- Interest rate
- Monthly installment
Loan Characteristics:
- Loan purpose (debt consolidation, home improvement, etc.)
- Loan grade/sub-grade
- Application type
Status:
- Loan status (current, fully paid, charged off)
- Target Variable: Default (0 = No Default, 1 = Default)
Dates:
- Application date
- Issue date
- Last payment date