Real-time supply chain intelligence platform with predictive analytics and automated data pipelines
Features โข Architecture โข Tech Stack โข Quick Start โข Screenshots
- Overview
- Features
- Architecture
- Tech Stack
- Project Structure
- Quick Start
- Components
- Screenshots
- Future Enhancements
- Contributing
- License
DataFlow Supply Chain Platform is an end-to-end data engineering project that demonstrates real-time supply chain intelligence capabilities. The platform processes and analyzes logistics data across orders, shipments, inventory, and suppliers to provide actionable insights for supply chain optimization.
- ๐๏ธ Production-grade infrastructure with Docker Compose
- ๐ Real-time streaming with Apache Kafka
- โ๏ธ Workflow orchestration with Apache Airflow
- ๐ Interactive dashboards with Streamlit
- ๐๏ธ Scalable data architecture with Star Schema
- ๐ Complete ETL pipeline from ingestion to visualization
- โ Multi-source data ingestion (PostgreSQL, MongoDB, Kafka)
- โ Real-time streaming pipeline with Kafka producers/consumers
- โ Batch processing with scheduled ETL jobs
- โ Data quality validation and monitoring
- โ Star schema data warehouse design
- โ Real-time KPI dashboards (Orders, Revenue, Delivery Performance)
- โ Interactive charts (Bar, Pie, Line, Maps)
- โ Geospatial analysis with shipment tracking
- โ Auto-refreshing metrics from live data streams
- โ Containerized services with Docker Compose
- โ Scalable architecture supporting 1000+ orders/day
- โ Monitoring & logging capabilities
- โ CI/CD ready structure
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DATA SOURCES โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ PostgreSQL โ MongoDB โ Kafka Stream โ External APIs โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ INGESTION LAYER โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โข Kafka Producers (Real-time Orders) โ
โ โข Data Generators (Suppliers, Products, Inventory) โ
โ โข API Connectors โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PROCESSING LAYER โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โข Kafka Consumers (Stream Processing) โ
โ โข Airflow DAGs (Batch ETL) โ
โ โข Data Transformation & Validation โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ STORAGE LAYER โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โข PostgreSQL (Raw, Staging, Analytics) โ
โ โข MongoDB (Product Catalog) โ
โ โข Redis (Caching) โ
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PRESENTATION LAYER โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โข Streamlit Dashboard (Real-time Metrics) โ
โ โข FastAPI (REST API - Optional) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Category | Technologies |
|---|---|
| Languages | Python 3.12 |
| Orchestration | Apache Airflow 2.10 |
| Streaming | Apache Kafka 7.5, Zookeeper |
| Databases | PostgreSQL 15, MongoDB 7.0, Redis 7.0 |
| Visualization | Streamlit, Plotly |
| Containerization | Docker, Docker Compose |
| Data Processing | Pandas, SQLAlchemy |
| Testing | Pytest |
confluent-kafka==2.3.0
sqlalchemy==2.0.25
pandas==2.1.4
streamlit==1.30.0
plotly==5.18.0
faker==22.0.0
dataflow-supply-chain/
โโโ src/
โ โโโ ingestion/ # Data generation & loading
โ โโโ streaming/ # Kafka producers/consumers
โ โโโ transformation/ # Data transformations
โ โโโ warehouse/ # Star schema logic
โ โโโ utils/ # DB connectors, logging
โ
โโโ airflow/
โ โโโ dags/ # ETL workflow definitions
โ
โโโ dashboards/
โ โโโ main_dashboard.py # Streamlit dashboard
โ โโโ pages/ # Multi-page layouts
โ โโโ components/ # Reusable UI components
โ
โโโ kafka/
โ โโโ producers/ # Order stream producers
โ โโโ consumers/ # Stream processors
โ
โโโ infrastructure/
โ โโโ docker/
โ โโโ docker-compose.yml
โ โโโ init-scripts/ # Database initialization
โ
โโโ data/
โ โโโ raw/ # Bronze layer
โ โโโ processed/ # Silver layer
โ โโโ analytics/ # Gold layer
โ
โโโ config/
โ โโโ config.yaml # Application configuration
โ
โโโ producer.py # Kafka order producer
โโโ consumer.py # Kafka order consumer
โโโ check_data.py # Database verification
โโโ README.md
- Docker & Docker Compose
- Python 3.11+
- 8GB RAM minimum
- WSL2 (for Windows users)
- Clone the repository
git clone https://github.com/yourusername/dataflow-supply-chain.git
cd dataflow-supply-chain- Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Start infrastructure
docker-compose up -dWait 2-3 minutes for all services to start.
- Initialize database & load data
python load_data.py- Start Kafka streaming (Optional - for real-time demo)
Terminal 1 - Producer:
python3 producer.pyTerminal 2 - Consumer:
python3 consumer.py- Launch dashboard
streamlit run dashboards/main_dashboard.pyOpen browser: http://localhost:8501
- Generates realistic supply chain data (Orders, Products, Suppliers)
- Faker library for realistic names and locations
- Configurable data volumes
- Producer: Streams new orders every 3 seconds
- Consumer: Processes and stores orders in PostgreSQL
- Kafka UI: Monitor topics at
http://localhost:8080
- Airflow DAGs: Scheduled ETL pipelines
- Airflow UI:
http://localhost:8081(admin/admin) - Daily transformations from Raw โ Staging โ Analytics
- Star Schema design with fact and dimension tables
- Layers: Bronze (raw) โ Silver (cleaned) โ Gold (analytics)
- Optimized for analytical queries
- Real-time KPIs and metrics
- Interactive charts and maps
- Auto-refresh every 30 seconds
Real-time supply chain metrics and KPIs
Live order processing
Automated ETL workflows
- ML-based demand forecasting
- Delivery time prediction model
- Anomaly detection system
- Great Expectations data quality framework
- Spark for large-scale processing
- CI/CD pipeline with GitHub Actions
- Cloud deployment (AWS/Azure)
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Buthainah
- GitHub: @Buthainah3524
For questions or collaboration opportunities, feel free to reach out!
- ๐ง Email: Contact via GitHub
- ๐ GitHub: @Buthainah3524
- Inspired by real-world supply chain challenges
- Built as a Data Engineering portfolio project
- Special thanks to the open-source community
โญ Star this repo if you find it helpful!
Made with โค๏ธ for Data Engineering Excellence


