Skip to content

42kiko/Corporacion-Favorita-Grocery-Sales-Forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📦 Corporación Favorita Grocery Sales Forecasting

A modular, clean, and scalable forecasting pipeline with Streamlit, Plotly, and Parquet preprocessing.


👋 Welcome to the Project

This repository implements a fully structured data-science workflow for the Kaggle Corporación Favorita Sales Forecasting dataset, including:

  • efficient CSV → Parquet preprocessing
  • modular data loading and region filtering
  • rich Exploratory Data Analysis (overview + deep dive)
  • reusable Plotly functions for both notebooks and Streamlit
  • a future interactive Streamlit forecasting app

All components follow a clean, maintainable architecture so you can reuse them across multiple projects.


🚀 Quickstart

macOS (zsh / bash)

python3 -m venv .venv && \
source .venv/bin/activate && \
python -m pip install --upgrade pip && \
pip install -e ".[dev]"

Windows PowerShell

python -m venv .venv; `
. .\.venv\Scripts\Activate.ps1; `
python -m pip install --upgrade pip; `
pip install -e ".[dev]"

Windows CMD

python -m venv .venv && .\.venv\Scripts\activate && python -m pip install --upgrade pip && pip install -e ".[dev]"

🧠 VS Code Interpreter Setup

If VS Code does not automatically select the correct Python interpreter:

# Open Command Palette:
cmd + shift + p

# Then search for:
"Python: Select Interpreter"

Alt-Text

EDA Notebook

# if you want run the intery project with the notebooks and create img and reports use the [viz]
pip install -e ".[viz,dev]"

📊 Exploratory Data Analysis Dashboard

This section summarizes the main visual insights generated in the EDA notebooks. It is structured into:

  • Overview EDA – high-level behavior across stores, items, time and promotions
  • Deep Dive EDA – focused views on holidays, oil, items, stores, train and transactions

🟦 1. Overview EDA

🏪 Store & Item Landscape

Store distribution Top stores by average sales
Top 30 items Unit sales distribution

📈 Sales Patterns & Seasonality

Total sales over time Average sales by day of week
Promotions vs. sales impact

🟩 2. Deep Dive EDA

🎁 Items

Top 40 families by total sales

💵 Oil Prices

Oil price timeseries

🎉 Holidays

Holidays by locale

🏙️ Stores

Stores per city

🛒 Train Dataset (Sales Deep Dive)

Daily total sales Unit sales histogram (sample)
Top 30 items by number of rows Top 30 stores by number of rows

💳 Transactions

Daily transaction totals

All plots shown here are generated by the EDA notebooks and saved under img/reports/eda_overview and img/reports/eda_deepdive.

About

Time series forecasting project based on the Kaggle Corporación Favorita dataset (Ecuador). Evaluated using NWRMSLE. Includes an interactive Streamlit benchmark app to explore, compare, and evaluate forecasting models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors