Skip to content

Buthainah3524/supply-chain-intelligence-platform

Repository files navigation

๐Ÿš€ DataFlow Supply Chain Intelligence Platform

Supply Chain Data Engineering Python Docker License

Real-time supply chain intelligence platform with predictive analytics and automated data pipelines

Features โ€ข Architecture โ€ข Tech Stack โ€ข Quick Start โ€ข Screenshots


๐Ÿ“‹ Table of Contents


๐ŸŽฏ Overview

DataFlow Supply Chain Platform is an end-to-end data engineering project that demonstrates real-time supply chain intelligence capabilities. The platform processes and analyzes logistics data across orders, shipments, inventory, and suppliers to provide actionable insights for supply chain optimization.

Key Highlights

  • ๐Ÿ—๏ธ Production-grade infrastructure with Docker Compose
  • ๐Ÿ“Š Real-time streaming with Apache Kafka
  • โš™๏ธ Workflow orchestration with Apache Airflow
  • ๐Ÿ“ˆ Interactive dashboards with Streamlit
  • ๐Ÿ—„๏ธ Scalable data architecture with Star Schema
  • ๐Ÿ”„ Complete ETL pipeline from ingestion to visualization

โœจ Features

Data Engineering

  • โœ… Multi-source data ingestion (PostgreSQL, MongoDB, Kafka)
  • โœ… Real-time streaming pipeline with Kafka producers/consumers
  • โœ… Batch processing with scheduled ETL jobs
  • โœ… Data quality validation and monitoring
  • โœ… Star schema data warehouse design

Analytics & Visualization

  • โœ… Real-time KPI dashboards (Orders, Revenue, Delivery Performance)
  • โœ… Interactive charts (Bar, Pie, Line, Maps)
  • โœ… Geospatial analysis with shipment tracking
  • โœ… Auto-refreshing metrics from live data streams

Infrastructure

  • โœ… Containerized services with Docker Compose
  • โœ… Scalable architecture supporting 1000+ orders/day
  • โœ… Monitoring & logging capabilities
  • โœ… CI/CD ready structure

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        DATA SOURCES                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  PostgreSQL  โ”‚  MongoDB  โ”‚  Kafka Stream  โ”‚  External APIs      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     INGESTION LAYER                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  โ€ข Kafka Producers (Real-time Orders)                           โ”‚
โ”‚  โ€ข Data Generators (Suppliers, Products, Inventory)             โ”‚
โ”‚  โ€ข API Connectors                                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   PROCESSING LAYER                               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  โ€ข Kafka Consumers (Stream Processing)                           โ”‚
โ”‚  โ€ข Airflow DAGs (Batch ETL)                                      โ”‚
โ”‚  โ€ข Data Transformation & Validation                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     STORAGE LAYER                                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  โ€ข PostgreSQL (Raw, Staging, Analytics)                          โ”‚
โ”‚  โ€ข MongoDB (Product Catalog)                                     โ”‚
โ”‚  โ€ข Redis (Caching)                                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   PRESENTATION LAYER                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  โ€ข Streamlit Dashboard (Real-time Metrics)                       โ”‚
โ”‚  โ€ข FastAPI (REST API - Optional)                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ› ๏ธ Tech Stack

Core Technologies

Category Technologies
Languages Python 3.12
Orchestration Apache Airflow 2.10
Streaming Apache Kafka 7.5, Zookeeper
Databases PostgreSQL 15, MongoDB 7.0, Redis 7.0
Visualization Streamlit, Plotly
Containerization Docker, Docker Compose
Data Processing Pandas, SQLAlchemy
Testing Pytest

Python Libraries

confluent-kafka==2.3.0
sqlalchemy==2.0.25
pandas==2.1.4
streamlit==1.30.0
plotly==5.18.0
faker==22.0.0

๐Ÿ“ Project Structure

dataflow-supply-chain/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ ingestion/          # Data generation & loading
โ”‚   โ”œโ”€โ”€ streaming/          # Kafka producers/consumers
โ”‚   โ”œโ”€โ”€ transformation/     # Data transformations
โ”‚   โ”œโ”€โ”€ warehouse/          # Star schema logic
โ”‚   โ””โ”€โ”€ utils/              # DB connectors, logging
โ”‚
โ”œโ”€โ”€ airflow/
โ”‚   โ””โ”€โ”€ dags/               # ETL workflow definitions
โ”‚
โ”œโ”€โ”€ dashboards/
โ”‚   โ”œโ”€โ”€ main_dashboard.py   # Streamlit dashboard
โ”‚   โ”œโ”€โ”€ pages/              # Multi-page layouts
โ”‚   โ””โ”€โ”€ components/         # Reusable UI components
โ”‚
โ”œโ”€โ”€ kafka/
โ”‚   โ”œโ”€โ”€ producers/          # Order stream producers
โ”‚   โ””โ”€โ”€ consumers/          # Stream processors
โ”‚
โ”œโ”€โ”€ infrastructure/
โ”‚   โ””โ”€โ”€ docker/
โ”‚       โ”œโ”€โ”€ docker-compose.yml
โ”‚       โ””โ”€โ”€ init-scripts/   # Database initialization
โ”‚
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ raw/                # Bronze layer
โ”‚   โ”œโ”€โ”€ processed/          # Silver layer
โ”‚   โ””โ”€โ”€ analytics/          # Gold layer
โ”‚
โ”œโ”€โ”€ config/
โ”‚   โ””โ”€โ”€ config.yaml         # Application configuration
โ”‚
โ”œโ”€โ”€ producer.py             # Kafka order producer
โ”œโ”€โ”€ consumer.py             # Kafka order consumer
โ”œโ”€โ”€ check_data.py           # Database verification
โ””โ”€โ”€ README.md

๐Ÿš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • Python 3.11+
  • 8GB RAM minimum
  • WSL2 (for Windows users)

Installation

  1. Clone the repository
git clone https://github.com/yourusername/dataflow-supply-chain.git
cd dataflow-supply-chain
  1. Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Start infrastructure
docker-compose up -d

Wait 2-3 minutes for all services to start.

  1. Initialize database & load data
python load_data.py
  1. Start Kafka streaming (Optional - for real-time demo)

Terminal 1 - Producer:

python3 producer.py

Terminal 2 - Consumer:

python3 consumer.py
  1. Launch dashboard
streamlit run dashboards/main_dashboard.py

Open browser: http://localhost:8501


๐Ÿ”ง Components

1. Data Generation

  • Generates realistic supply chain data (Orders, Products, Suppliers)
  • Faker library for realistic names and locations
  • Configurable data volumes

2. Real-time Streaming

  • Producer: Streams new orders every 3 seconds
  • Consumer: Processes and stores orders in PostgreSQL
  • Kafka UI: Monitor topics at http://localhost:8080

3. Batch Processing

  • Airflow DAGs: Scheduled ETL pipelines
  • Airflow UI: http://localhost:8081 (admin/admin)
  • Daily transformations from Raw โ†’ Staging โ†’ Analytics

4. Data Warehouse

  • Star Schema design with fact and dimension tables
  • Layers: Bronze (raw) โ†’ Silver (cleaned) โ†’ Gold (analytics)
  • Optimized for analytical queries

5. Dashboard

  • Real-time KPIs and metrics
  • Interactive charts and maps
  • Auto-refresh every 30 seconds

๐Ÿ“ธ Screenshots

Dashboard Overview

Real-time supply chain metrics and KPIs

Dashboard

Kafka Streaming

Live order processing

Streaming

Airflow Pipeline

Automated ETL workflows

Airflow


๐Ÿ”ฎ Future Enhancements

  • ML-based demand forecasting
  • Delivery time prediction model
  • Anomaly detection system
  • Great Expectations data quality framework
  • Spark for large-scale processing
  • CI/CD pipeline with GitHub Actions
  • Cloud deployment (AWS/Azure)

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ‘ค Author

๐Ÿ‘ค Author

Buthainah


๐Ÿ“ง Contact

For questions or collaboration opportunities, feel free to reach out!


๐Ÿ™ Acknowledgments

  • Inspired by real-world supply chain challenges
  • Built as a Data Engineering portfolio project
  • Special thanks to the open-source community

โญ Star this repo if you find it helpful!

Made with โค๏ธ for Data Engineering Excellence

About

Real-time supply chain intelligence platform with Kafka streaming, Airflow orchestration, PostgreSQL data warehouse, and Streamlit dashboards. End-to-end data engineering project.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages