This project demonstrates a production-style, end-to-end data engineering pipeline that ingests raw Walmart sales data from AWS S3 into Snowflake and transforms it using dbt into a dimensional warehouse model implementing both SCD Type 1 and SCD Type 2 logic.
The pipeline follows Medallion Architecture principles (Bronze β Silver) and includes full DEV and PROD environment separation to simulate real-world enterprise deployment practices.
A CI/CD workflow is implemented using GitHub version control and dbt Cloud job orchestration:
- All development occurs in a dedicated DEV environment (WALMART_DEV)
- Changes are version-controlled in GitHub
- Code is merged into the main branch
- A production dbt job automatically pulls the latest code
- Models and snapshots are executed in the PROD environment (WALMART_PROD)
This mirrors modern analytics engineering practices by combining:
- Cloud data ingestion (AWS S3)
- Scalable cloud data warehousing (Snowflake)
- Transformation-as-code (dbt)
- Environment isolation (DEV β PROD promotion)
- Continuous Integration and Continuous Deployment (CI/CD)
- Business analytics delivery via Python (Seaborn and Plotly)
The result is a structured, production-ready analytics warehouse capable of supporting historical reporting, dimensional analysis, and executive-level business insights.
- CSV files uploaded to AWS S3
- dbt loads raw data into Snowflake Bronze tables
- dbt transforms Bronze β Silver dimensional model
- SCD Type 1 logic applied to dimension tables
- SCD Type 2 snapshot logic applied to fact table
- dbt Production job executes models in WALMART_PROD database
- Python script queries warehouse and generates visualizations
walmart-sales-analytics-snowflake-dbt/
β
βββ README.md
β
βββ architecture/
β βββ architecture-diagram.png
β
βββ dbt/
β βββ dbt_project.yml
β βββ package-lock.yml
β βββ packages.yml
β βββ models/
β βββ snapshots/
β βββ macros/
β
βββ snowflake/
β βββ dev_setup.sql
β βββ prod_setup.sql
β
βββ python_analytics/
β βββ generate_visualizations.py
β
βββ visualizations/
β
βββ screenshots/
The project uses three datasets:
- stores.csv β Store information
- department.csv β Department-level sales
- fact.csv β Weekly store metrics (temperature, CPI, fuel price, etc.)
All files were uploaded into an AWS S3 bucket before ingestion.
- Raw ingestion from S3
- Minimal transformations
- Mirrors source CSV structure
- Dimensional model
- Two dimension tables
- One fact table
- Implemented using MERGE logic in dbt
- Overwrites historical values
- Maintains current state
- Implemented using dbt snapshot
- Tracks historical changes
- Maintains version history
- Enables historical reporting
Two isolated environments were implemented:
| Environment | Database |
|---|---|
| DEV | WALMART_DEV |
| PROD | WALMART_PROD |
Schemas and table names remain consistent across environments.
After transformation, the warehouse powers multiple business insights generated using Python (Seaborn and Plotly).
- Weekly sales by store and holiday
- Weekly sales by temperature and year
- Weekly sales by store size
- Weekly sales by store type and month
- Markdown sales by year and store
- Weekly sales by store type
- Fuel price by year
- Weekly sales by year
- Weekly sales by month
- Weekly sales by date
- Weekly sales by CPI
- Weekly sales by department
All visualization outputs are stored in the visualizations/ directory.
- AWS S3
- Snowflake
- dbt
- Python
- Seaborn
- Plotly
- Dimensional Modeling
- Medallion Architecture
- SCD Type 1
- SCD Type 2
- Cloud ingestion workflow
- Medallion architecture implementation
- Dimensional modeling best practices
- Snapshot-based historical tracking
- Environment isolation (DEV vs PROD)
- Production job orchestration
- Analytical data product delivery
This project demonstrates how raw retail sales data can be transformed into a structured analytics warehouse capable of supporting:
- Historical trend analysis
- Store performance comparison
- Seasonality insights
- External factor impact analysis (CPI, fuel price, temperature)
- Executive-level reporting
- Implement GitHub Actions for automated dbt test execution on pull requests
- Add automated data quality gates prior to production deployment
- Provision Snowflake infrastructure using Terraform
- Implement branch-based deployment strategies
- Add BI dashboard layer (Power BI / Tableau / Streamlit)
Johnathon Smith
Data Engineer focused on building scalable cloud data platforms using AWS, Snowflake, and dbt.








