I'm Shashank, a Data & AI practitioner who treats data science as a toolkit for changing how decisions get made. For me, a model is never the destination. It's the first step in a chain of decision design. A prediction matters when it enables the right action at the right moment. That's the system I build.
📍 India | 🎓 B.Tech CSE, IIT Guwahati |
This system is built to route each predicted churner to a retention action. Each predicted churner is matched to a behavioral persona via SHAP-based scoring, and that persona, cross-referenced with their CLV segment, determines which campaign fires. Model selection itself followed the same logic. CatBoost was chosen over XGBoost not on overall AP (0.916 each) but on VIP-segment AP specifically (0.9328 vs 0.9077), because that segment is where the retention decision carries the most business weight.
CatBoost XGBoost MLflow Docker GCP Cloud Run FastAPI Streamlit SHAP
🚀 Deployed on GCP Cloud Run with MLflow-controlled model registry
Volatility forecasting built on the premise that a single model fitted across all market conditions obscures more than it reveals. A 2-state Gaussian HMM identifies low and high-volatility regimes. Separate GARCH and EGARCH models run per regime, with EGARCH selected for the high-volatility state on AIC (−4.9216 vs −4.9118). Validated via walk-forward backtesting over 1,000 trading days, the only honest way to test a model that will be used sequentially, achieving 0.2338 MAE vs 0.2606 on a fixed-split baseline, a 10.3% improvement that only shows up when you test the way the model will actually be used.
HMM GARCH EGARCH statsmodels walk-forward backtesting
🌐 Live hosted app
The decision here is lending risk classification across four tiers (P1–P4) without relying solely on a credit bureau score. A decision tree router first separates records by risk band, then segment-specific XGBoost classifiers handle each band, with 79% weighted accuracy across 42,064 records. A separate exploration notebook surfaces a finding that matters for any real deployment. Even a well-fitted reconstructed credit score (R² > 0.9) still degrades downstream classification, which changes how you'd think about feature sourcing in production.
XGBoost scikit-learn Flask two-stage pipeline
💻 Locally runnable Flask app (python src/deploy.py)
2026 deep learning re-entry that progresses from a feedforward baseline on CIFAR-10 through CNN modeling to studying residual connections using both the Functional API and tf.keras.Model subclassing. Built to rebuild momentum and foundational depth before tackling more complex architectures.
TensorFlow Keras CNN ResNet CIFAR-10
Decision system built on real event log data, combining process mining and machine learning to surface what actually drives outcomes in sequential, timestamped data. Data processing is complete. Deliberately paused. The decision-design layer deserves to be done right, and I want more project depth before committing to the design choices that will define this one.
process mining event logs sequence modeling
| Repository | Description |
|---|---|
| google-advanced-data-analytics | Google Advanced Data Analytics Professional Certificate exercises |
| google-cloud-ai-agents | Hands-on work from Google Cloud Gen AI Academy, Cohort 1 |
| datacamp-ds-cert | DataCamp Data Scientist certification coursework |
| iitg-cse | B.Tech Computer Science coursework, IIT Guwahati |
Python · Predictive · Time-Series · SQL · GCP · Docker · MLflow · CI/CD · Power BI · scikit-learn · statsmodels · SHAP · FastAPI · Flask · Streamlit
