Skip to content

johnathon-smith/walmart-sales-analytics-snowflake-dbt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Walmart Sales Analytics Pipeline

End-to-End Data Engineering Project Using AWS S3, Snowflake, and dbt


πŸ“Œ Executive Summary

This project demonstrates a production-style, end-to-end data engineering pipeline that ingests raw Walmart sales data from AWS S3 into Snowflake and transforms it using dbt into a dimensional warehouse model implementing both SCD Type 1 and SCD Type 2 logic.

The pipeline follows Medallion Architecture principles (Bronze β†’ Silver) and includes full DEV and PROD environment separation to simulate real-world enterprise deployment practices.

A CI/CD workflow is implemented using GitHub version control and dbt Cloud job orchestration:

  • All development occurs in a dedicated DEV environment (WALMART_DEV)
  • Changes are version-controlled in GitHub
  • Code is merged into the main branch
  • A production dbt job automatically pulls the latest code
  • Models and snapshots are executed in the PROD environment (WALMART_PROD)

This mirrors modern analytics engineering practices by combining:

  • Cloud data ingestion (AWS S3)
  • Scalable cloud data warehousing (Snowflake)
  • Transformation-as-code (dbt)
  • Environment isolation (DEV β†’ PROD promotion)
  • Continuous Integration and Continuous Deployment (CI/CD)
  • Business analytics delivery via Python (Seaborn and Plotly)

The result is a structured, production-ready analytics warehouse capable of supporting historical reporting, dimensional analysis, and executive-level business insights.


πŸ— Architecture Overview

Data Flow

  1. CSV files uploaded to AWS S3
  2. dbt loads raw data into Snowflake Bronze tables
  3. dbt transforms Bronze β†’ Silver dimensional model
  4. SCD Type 1 logic applied to dimension tables
  5. SCD Type 2 snapshot logic applied to fact table
  6. dbt Production job executes models in WALMART_PROD database
  7. Python script queries warehouse and generates visualizations

πŸ“Έ Architecture Diagram

Architecture Diagram


πŸ“‚ Repository Structure

walmart-sales-analytics-snowflake-dbt/  
β”‚  
β”œβ”€β”€ README.md  
β”‚  
β”œβ”€β”€ architecture/  
β”‚   └── architecture-diagram.png  
β”‚  
β”œβ”€β”€ dbt/  
β”‚   β”œβ”€β”€ dbt_project.yml  
β”‚   β”œβ”€β”€ package-lock.yml  
β”‚   β”œβ”€β”€ packages.yml  
β”‚   β”œβ”€β”€ models/  
β”‚   β”œβ”€β”€ snapshots/  
β”‚   └── macros/  
β”‚  
β”œβ”€β”€ snowflake/  
β”‚   β”œβ”€β”€ dev_setup.sql  
β”‚   └── prod_setup.sql  
β”‚  
β”œβ”€β”€ python_analytics/  
β”‚   └── generate_visualizations.py  
β”‚  
β”œβ”€β”€ visualizations/  
β”‚  
└── screenshots/  

πŸ“Š Source Data

The project uses three datasets:

  • stores.csv – Store information
  • department.csv – Department-level sales
  • fact.csv – Weekly store metrics (temperature, CPI, fuel price, etc.)

All files were uploaded into an AWS S3 bucket before ingestion.

πŸ“Έ S3 Upload

S3 Bucket


🧱 Data Warehouse Design

Medallion Architecture

Bronze Layer

  • Raw ingestion from S3
  • Minimal transformations
  • Mirrors source CSV structure

Bronze Tables


Silver Layer

  • Dimensional model
  • Two dimension tables
  • One fact table

Silver Tables


πŸ“Š Slowly Changing Dimensions

SCD Type 1 (Dimension Tables)

  • Implemented using MERGE logic in dbt
  • Overwrites historical values
  • Maintains current state

SCD Type 2 (Fact Table)

  • Implemented using dbt snapshot
  • Tracks historical changes
  • Maintains version history
  • Enables historical reporting

dbt Lineage


βš™οΈ Environment Configuration

Two isolated environments were implemented:

Environment Database
DEV WALMART_DEV
PROD WALMART_PROD

Schemas and table names remain consistent across environments.

πŸ“Έ DEV Environment

DEV Environment

πŸ“Έ PROD Environment

PROD Environment

πŸ“Έ dbt Production Job

dbt Job dbt Job


πŸ“ˆ Analytical Visualizations

After transformation, the warehouse powers multiple business insights generated using Python (Seaborn and Plotly).

Generated Visualizations

  • Weekly sales by store and holiday
  • Weekly sales by temperature and year
  • Weekly sales by store size
  • Weekly sales by store type and month
  • Markdown sales by year and store
  • Weekly sales by store type
  • Fuel price by year
  • Weekly sales by year
  • Weekly sales by month
  • Weekly sales by date
  • Weekly sales by CPI
  • Weekly sales by department

All visualization outputs are stored in the visualizations/ directory.


πŸ›  Technologies Used

  • AWS S3
  • Snowflake
  • dbt
  • Python
  • Seaborn
  • Plotly
  • Dimensional Modeling
  • Medallion Architecture
  • SCD Type 1
  • SCD Type 2

🎯 Engineering Concepts Demonstrated

  • Cloud ingestion workflow
  • Medallion architecture implementation
  • Dimensional modeling best practices
  • Snapshot-based historical tracking
  • Environment isolation (DEV vs PROD)
  • Production job orchestration
  • Analytical data product delivery

πŸ’‘ Business Value

This project demonstrates how raw retail sales data can be transformed into a structured analytics warehouse capable of supporting:

  • Historical trend analysis
  • Store performance comparison
  • Seasonality insights
  • External factor impact analysis (CPI, fuel price, temperature)
  • Executive-level reporting

πŸš€ Future Enhancements

  • Implement GitHub Actions for automated dbt test execution on pull requests
  • Add automated data quality gates prior to production deployment
  • Provision Snowflake infrastructure using Terraform
  • Implement branch-based deployment strategies
  • Add BI dashboard layer (Power BI / Tableau / Streamlit)

πŸ‘€ Author

Johnathon Smith
Data Engineer focused on building scalable cloud data platforms using AWS, Snowflake, and dbt.


About

End-to-end data engineering project using AWS S3, Snowflake, and dbt to implement Medallion Architecture with SCD Type 1 & Type 2 logic on Walmart sales data, followed by analytical visualizations using Seaborn and Plotly.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages