🚀 DataFlow Supply Chain Intelligence Platform

Real-time supply chain intelligence platform with predictive analytics and automated data pipelines

Features • Architecture • Tech Stack • Quick Start • Screenshots

📋 Table of Contents

Overview
Features
Architecture
Tech Stack
Project Structure
Quick Start
Components
Screenshots
Future Enhancements
Contributing
License

🎯 Overview

DataFlow Supply Chain Platform is an end-to-end data engineering project that demonstrates real-time supply chain intelligence capabilities. The platform processes and analyzes logistics data across orders, shipments, inventory, and suppliers to provide actionable insights for supply chain optimization.

Key Highlights

🏗️ Production-grade infrastructure with Docker Compose
📊 Real-time streaming with Apache Kafka
⚙️ Workflow orchestration with Apache Airflow
📈 Interactive dashboards with Streamlit
🗄️ Scalable data architecture with Star Schema
🔄 Complete ETL pipeline from ingestion to visualization

✨ Features

Data Engineering

✅ Multi-source data ingestion (PostgreSQL, MongoDB, Kafka)
✅ Real-time streaming pipeline with Kafka producers/consumers
✅ Batch processing with scheduled ETL jobs
✅ Data quality validation and monitoring
✅ Star schema data warehouse design

Analytics & Visualization

✅ Real-time KPI dashboards (Orders, Revenue, Delivery Performance)
✅ Interactive charts (Bar, Pie, Line, Maps)
✅ Geospatial analysis with shipment tracking
✅ Auto-refreshing metrics from live data streams

Infrastructure

✅ Containerized services with Docker Compose
✅ Scalable architecture supporting 1000+ orders/day
✅ Monitoring & logging capabilities
✅ CI/CD ready structure

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        DATA SOURCES                              │
├─────────────────────────────────────────────────────────────────┤
│  PostgreSQL  │  MongoDB  │  Kafka Stream  │  External APIs      │
└────────┬─────────────────────────────────────────────────────────┘
         │
         ↓
┌─────────────────────────────────────────────────────────────────┐
│                     INGESTION LAYER                              │
├─────────────────────────────────────────────────────────────────┤
│  • Kafka Producers (Real-time Orders)                           │
│  • Data Generators (Suppliers, Products, Inventory)             │
│  • API Connectors                                                │
└────────┬─────────────────────────────────────────────────────────┘
         │
         ↓
┌─────────────────────────────────────────────────────────────────┐
│                   PROCESSING LAYER                               │
├─────────────────────────────────────────────────────────────────┤
│  • Kafka Consumers (Stream Processing)                           │
│  • Airflow DAGs (Batch ETL)                                      │
│  • Data Transformation & Validation                              │
└────────┬─────────────────────────────────────────────────────────┘
         │
         ↓
┌─────────────────────────────────────────────────────────────────┐
│                     STORAGE LAYER                                │
├─────────────────────────────────────────────────────────────────┤
│  • PostgreSQL (Raw, Staging, Analytics)                          │
│  • MongoDB (Product Catalog)                                     │
│  • Redis (Caching)                                               │
└────────┬─────────────────────────────────────────────────────────┘
         │
         ↓
┌─────────────────────────────────────────────────────────────────┐
│                   PRESENTATION LAYER                             │
├─────────────────────────────────────────────────────────────────┤
│  • Streamlit Dashboard (Real-time Metrics)                       │
│  • FastAPI (REST API - Optional)                                 │
└─────────────────────────────────────────────────────────────────┘

🛠️ Tech Stack

Core Technologies

Category	Technologies
Languages	Python 3.12
Orchestration	Apache Airflow 2.10
Streaming	Apache Kafka 7.5, Zookeeper
Databases	PostgreSQL 15, MongoDB 7.0, Redis 7.0
Visualization	Streamlit, Plotly
Containerization	Docker, Docker Compose
Data Processing	Pandas, SQLAlchemy
Testing	Pytest

Python Libraries

confluent-kafka==2.3.0
sqlalchemy==2.0.25
pandas==2.1.4
streamlit==1.30.0
plotly==5.18.0
faker==22.0.0

📁 Project Structure

dataflow-supply-chain/
├── src/
│   ├── ingestion/          # Data generation & loading
│   ├── streaming/          # Kafka producers/consumers
│   ├── transformation/     # Data transformations
│   ├── warehouse/          # Star schema logic
│   └── utils/              # DB connectors, logging
│
├── airflow/
│   └── dags/               # ETL workflow definitions
│
├── dashboards/
│   ├── main_dashboard.py   # Streamlit dashboard
│   ├── pages/              # Multi-page layouts
│   └── components/         # Reusable UI components
│
├── kafka/
│   ├── producers/          # Order stream producers
│   └── consumers/          # Stream processors
│
├── infrastructure/
│   └── docker/
│       ├── docker-compose.yml
│       └── init-scripts/   # Database initialization
│
├── data/
│   ├── raw/                # Bronze layer
│   ├── processed/          # Silver layer
│   └── analytics/          # Gold layer
│
├── config/
│   └── config.yaml         # Application configuration
│
├── producer.py             # Kafka order producer
├── consumer.py             # Kafka order consumer
├── check_data.py           # Database verification
└── README.md

🚀 Quick Start

Prerequisites

Docker & Docker Compose
Python 3.11+
8GB RAM minimum
WSL2 (for Windows users)

Installation

Clone the repository

git clone https://github.com/yourusername/dataflow-supply-chain.git
cd dataflow-supply-chain

Create virtual environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Start infrastructure

docker-compose up -d

Wait 2-3 minutes for all services to start.

Initialize database & load data

python load_data.py

Start Kafka streaming (Optional - for real-time demo)

Terminal 1 - Producer:

python3 producer.py

Terminal 2 - Consumer:

python3 consumer.py

Launch dashboard

streamlit run dashboards/main_dashboard.py

Open browser: http://localhost:8501

🔧 Components

1. Data Generation

Generates realistic supply chain data (Orders, Products, Suppliers)
Faker library for realistic names and locations
Configurable data volumes

2. Real-time Streaming

Producer: Streams new orders every 3 seconds
Consumer: Processes and stores orders in PostgreSQL
Kafka UI: Monitor topics at http://localhost:8080

3. Batch Processing

Airflow DAGs: Scheduled ETL pipelines
Airflow UI: http://localhost:8081 (admin/admin)
Daily transformations from Raw → Staging → Analytics

4. Data Warehouse

Star Schema design with fact and dimension tables
Layers: Bronze (raw) → Silver (cleaned) → Gold (analytics)
Optimized for analytical queries

5. Dashboard

Real-time KPIs and metrics
Interactive charts and maps
Auto-refresh every 30 seconds

📸 Screenshots

Dashboard Overview

Real-time supply chain metrics and KPIs

Kafka Streaming

Live order processing

Airflow Pipeline

Automated ETL workflows

🔮 Future Enhancements

ML-based demand forecasting
Delivery time prediction model
Anomaly detection system
Great Expectations data quality framework
Spark for large-scale processing
CI/CD pipeline with GitHub Actions
Cloud deployment (AWS/Azure)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👤 Author

Buthainah

GitHub: @Buthainah3524

📧 Contact

For questions or collaboration opportunities, feel free to reach out!

📧 Email: Contact via GitHub
🐙 GitHub: @Buthainah3524

🙏 Acknowledgments

Inspired by real-world supply chain challenges
Built as a Data Engineering portfolio project
Special thanks to the open-source community

⭐ Star this repo if you find it helpful!

Made with ❤️ for Data Engineering Excellence

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
airflow		airflow
dashboards		dashboards
data		data
docs/screenshots		docs/screenshots
infrastructure/docker/init-scripts		infrastructure/docker/init-scripts
src		src
.gitignore		.gitignore
README.md		README.md
check_data.py		check_data.py
config.yaml		config.yaml
consume_orders.py		consume_orders.py
consumer.py		consumer.py
docker-compose.yml		docker-compose.yml
producer.py		producer.py
requirements.txt		requirements.txt
stream_orders.py		stream_orders.py
test_db.py		test_db.py
test_kafka.py		test_kafka.py

Folders and files

Latest commit

History

Repository files navigation

🚀 DataFlow Supply Chain Intelligence Platform

📋 Table of Contents

🎯 Overview

Key Highlights

✨ Features

Data Engineering

Analytics & Visualization

Infrastructure

🏗️ Architecture

🛠️ Tech Stack

Core Technologies

Python Libraries

📁 Project Structure

🚀 Quick Start

Prerequisites

Installation

🔧 Components

1. Data Generation

2. Real-time Streaming

3. Batch Processing

4. Data Warehouse

5. Dashboard

📸 Screenshots

Dashboard Overview

Kafka Streaming

Airflow Pipeline

🔮 Future Enhancements

🤝 Contributing

📄 License

👤 Author

👤 Author

📧 Contact

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages