Skip to content
View shashankgarewal's full-sized avatar
🟢
working on a ML analysis of event log
🟢
working on a ML analysis of event log

Block or report shashankgarewal

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shashankgarewal/README.md

Hi there 👋

I'm Shashank, a Data & AI practitioner who treats data science as a toolkit for changing how decisions get made. For me, a model is never the destination. It's the first step in a chain of decision design. A prediction matters when it enables the right action at the right moment. That's the system I build.

📍 India  |  🎓 B.Tech CSE, IIT Guwahati  |  LinkedIn


Projects

This system is built to route each predicted churner to a retention action. Each predicted churner is matched to a behavioral persona via SHAP-based scoring, and that persona, cross-referenced with their CLV segment, determines which campaign fires. Model selection itself followed the same logic. CatBoost was chosen over XGBoost not on overall AP (0.916 each) but on VIP-segment AP specifically (0.9328 vs 0.9077), because that segment is where the retention decision carries the most business weight.
CatBoost XGBoost MLflow Docker GCP Cloud Run FastAPI Streamlit SHAP
🚀 Deployed on GCP Cloud Run with MLflow-controlled model registry

Volatility forecasting built on the premise that a single model fitted across all market conditions obscures more than it reveals. A 2-state Gaussian HMM identifies low and high-volatility regimes. Separate GARCH and EGARCH models run per regime, with EGARCH selected for the high-volatility state on AIC (−4.9216 vs −4.9118). Validated via walk-forward backtesting over 1,000 trading days, the only honest way to test a model that will be used sequentially, achieving 0.2338 MAE vs 0.2606 on a fixed-split baseline, a 10.3% improvement that only shows up when you test the way the model will actually be used.
HMM GARCH EGARCH statsmodels walk-forward backtesting
🌐 Live hosted app

The decision here is lending risk classification across four tiers (P1–P4) without relying solely on a credit bureau score. A decision tree router first separates records by risk band, then segment-specific XGBoost classifiers handle each band, with 79% weighted accuracy across 42,064 records. A separate exploration notebook surfaces a finding that matters for any real deployment. Even a well-fitted reconstructed credit score (R² > 0.9) still degrades downstream classification, which changes how you'd think about feature sourcing in production.
XGBoost scikit-learn Flask two-stage pipeline
💻 Locally runnable Flask app (python src/deploy.py)


Practice & Skill-Building

2026 deep learning re-entry that progresses from a feedforward baseline on CIFAR-10 through CNN modeling to studying residual connections using both the Functional API and tf.keras.Model subclassing. Built to rebuild momentum and foundational depth before tackling more complex architectures.
TensorFlow Keras CNN ResNet CIFAR-10


In Progress

Decision system built on real event log data, combining process mining and machine learning to surface what actually drives outcomes in sequential, timestamped data. Data processing is complete. Deliberately paused. The decision-design layer deserves to be done right, and I want more project depth before committing to the design choices that will define this one.
process mining event logs sequence modeling


Coursework & Learning

Repository Description
google-advanced-data-analytics Google Advanced Data Analytics Professional Certificate exercises
google-cloud-ai-agents Hands-on work from Google Cloud Gen AI Academy, Cohort 1
datacamp-ds-cert DataCamp Data Scientist certification coursework
iitg-cse B.Tech Computer Science coursework, IIT Guwahati

Toolkit

Python · Predictive · Time-Series · SQL · GCP · Docker · MLflow · CI/CD · Power BI · scikit-learn · statsmodels · SHAP · FastAPI · Flask · Streamlit

Pinned Loading

  1. customer-churn-prediction-segmentation-revenue-risk-analysis customer-churn-prediction-segmentation-revenue-risk-analysis Public

    End-to-end ecommerce customer churn prediction pipeline with LTV-segmented scoring, SHAP-Based persona discovery, customer persona assignment, and segment-aware retention strategies

    Jupyter Notebook

  2. time-series-analysis-forecasting time-series-analysis-forecasting Public

    A structured project for time series analysis and forecasting using statistical methods

    Jupyter Notebook 1

  3. credit-risk-assessment credit-risk-assessment Public

    Machine learning system that predicts borrower credit risk (P1-to-P4) using financial and personal data, helping financial institutions make data-driven lending decisions.

    Jupyter Notebook

  4. temporal-sequence-eventLog temporal-sequence-eventLog Public

    Workflow event logs to ML-based predictive timelines: deterministic reconstruction, sequence features (waiting/time gaps), and early risk prediction for prioritization.

    Jupyter Notebook

  5. machine-learning-study-2023 machine-learning-study-2023 Public

    This repo contains my learnings and practice in ML and DL from 2023

    Jupyter Notebook

  6. health-insurance-cross-sell-predictor health-insurance-cross-sell-predictor Public

    This repo project utilizes the health insurance company data to predict customer interested in vehicle insurance.

    Jupyter Notebook