This project predicts proactive stockout risks within a 14-day international replenishment window and estimates the economic impact of potential shortages.
By applying a custom Business Logic layer to a calibrated XGBoost model, the system identifies high-value replenishment risks, allowing supply chain managers to act before the stock hits zero.
Try the interactive "What-if" simulation dashboard here π https://retail-stockout-risk-scoring.streamlit.app/
Retailers operating with global supply chains often face a high risk of stockouts. Initially, this project aimed to predict stockouts using deterministic rules (Inventory < 10), but an exploratory data audit revealed a Data Leakage trap that caused a false 1.0 AUC, generating severe alert fatigue.
The Solution: I engineered an end-to-end Strategic 14-day Warning System. By injecting 5% stochastic noise and re-engineering the target signal based on Sales Velocity vs. Lead-Time, the model achieved a robust and realistic AUC of 0.91. This system isolates high-value revenue risks via a daily MLOps inference suite.
To support business prioritization and avoid alerting on low-value items, we compute:
Business Impact Score = Stockout Probability Γ Price Γ Sales Velocity
Where:
- Stockout Probability β Calibrated likelihood of depletion within the 14-day window.
- Price β Unit economic value of the product.
- Sales Velocity β Historical units sold (demand speed).
This allows ranking products not only by the probability of shortage but by financial impact, maximizing revenue protection for the company.
- Algorithm: XGBoost Classifier (optimized for tabular retail data).
- Evaluation Metric: ROC-AUC (0.9085 achieved on realistic, noisy data).
- Key Methodological Decisions:
- Stochastic Decoupling: Injected Gaussian noise to simulate real-world ERP lags, forcing the model to learn genuine market patterns rather than hard-coded thresholds.
- Data Leakage Prevention: Removed deterministic variables (future demand) from the feature space to ensure the model remains robust in production.
- Cost-Sensitive Learning: Handled class imbalance (85/15) natively using
scale_pos_weightto preserve the integrity of predicted probabilities. - Encapsulated Inference Architecture: The final
.pklartifact contains a customTransformerMixinclass. This allows the interactive Streamlit dashboard to ingest raw user inputs and autonomously handle feature mapping, cyclic time engineering, and imputation on the fly, guaranteeing zero training-serving skew.
π¦ retail-stockout-risk-scoring
02_Data/01_Raw/- Original inventory dataset (retail_store_inventory.csv)
03_Notebooks/01_setup_and_healing.ipynb- Environment setup & Stochastic Noise Injection02_eda.ipynb- Data validation & Leakage identification03_feature_engineering.ipynb- 14-day Target re-definition & cyclic time features04_feature_preselection.ipynb- Leakage prevention & feature importance ranking05_modeling_classification.ipynb- XGBoost training & AUC validation (0.91)06_production_framework_mlops.ipynb- End-to-end MLOps scripts (Retraining & Alerts)07_streamlit_pipeline_packaging.ipynb- Serialization of the "Black-Box" Pipeline artifact
04_Models/full_pipeline_14day_strategic.pkl- Serialized pipeline loaded by Streamlit
app.py- Streamlit simulation applicationrequirements.txt- Python dependenciesREADME.md- Documentation (this file)
# Clone the repository
git clone [https://github.com/yourusername/retail-stockout-risk-scoring.git](https://github.com/yourusername/retail-stockout-risk-scoring.git)
cd retail-stockout-risk-scoring
# Create and activate environment (optional)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run Streamlit app
streamlit run app.py