Shashank Garewal shashankgarewal

Hi there 👋

I'm Shashank, a Data & AI practitioner who treats data science as a toolkit for changing how decisions get made. For me, a model is never the destination. It's the first step in a chain of decision design. A prediction matters when it enables the right action at the right moment. That's the system I build.

📍 India | 🎓 B.Tech CSE, IIT Guwahati |

Projects

Customer Churn Prediction, Segmentation & Revenue Risk Analysis

This system is built to route each predicted churner to a retention action. Each predicted churner is matched to a behavioral persona via SHAP-based scoring, and that persona, cross-referenced with their CLV segment, determines which campaign fires. Model selection itself followed the same logic. CatBoost was chosen over XGBoost not on overall AP (0.916 each) but on VIP-segment AP specifically (0.9328 vs 0.9077), because that segment is where the retention decision carries the most business weight.
CatBoost XGBoost MLflow Docker GCP Cloud Run FastAPI Streamlit SHAP
🚀 Deployed on GCP Cloud Run with MLflow-controlled model registry

Time Series Analysis & Volatility Forecasting

Volatility forecasting built on the premise that a single model fitted across all market conditions obscures more than it reveals. A 2-state Gaussian HMM identifies low and high-volatility regimes. Separate GARCH and EGARCH models run per regime, with EGARCH selected for the high-volatility state on AIC (−4.9216 vs −4.9118). Validated via walk-forward backtesting over 1,000 trading days, the only honest way to test a model that will be used sequentially, achieving 0.2338 MAE vs 0.2606 on a fixed-split baseline, a 10.3% improvement that only shows up when you test the way the model will actually be used.
HMM GARCH EGARCH statsmodels walk-forward backtesting
🌐 Live hosted app

Credit Risk Assessment

The decision here is lending risk classification across four tiers (P1–P4) without relying solely on a credit bureau score. A decision tree router first separates records by risk band, then segment-specific XGBoost classifiers handle each band, with 79% weighted accuracy across 42,064 records. A separate exploration notebook surfaces a finding that matters for any real deployment. Even a well-fitted reconstructed credit score (R² > 0.9) still degrades downstream classification, which changes how you'd think about feature sourcing in production.
XGBoost scikit-learn Flask two-stage pipeline
💻 Locally runnable Flask app (python src/deploy.py)

Practice & Skill-Building

Keras / ResNet Refresh 2026

2026 deep learning re-entry that progresses from a feedforward baseline on CIFAR-10 through CNN modeling to studying residual connections using both the Functional API and tf.keras.Model subclassing. Built to rebuild momentum and foundational depth before tackling more complex architectures.
TensorFlow Keras CNN ResNet CIFAR-10

In Progress

Temporal Sequence & Event Log Analysis

Decision system built on real event log data, combining process mining and machine learning to surface what actually drives outcomes in sequential, timestamped data. Data processing is complete. Deliberately paused. The decision-design layer deserves to be done right, and I want more project depth before committing to the design choices that will define this one.
process mining event logs sequence modeling

Coursework & Learning

Repository	Description
google-advanced-data-analytics	Google Advanced Data Analytics Professional Certificate exercises
google-cloud-ai-agents	Hands-on work from Google Cloud Gen AI Academy, Cohort 1
datacamp-ds-cert	DataCamp Data Scientist certification coursework
iitg-cse	B.Tech Computer Science coursework, IIT Guwahati

Toolkit

Python · Predictive · Time-Series · SQL · GCP · Docker · MLflow · CI/CD · Power BI · scikit-learn · statsmodels · SHAP · FastAPI · Flask · Streamlit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly