Multi-source ingestion (GitHub + ADLS) → ADLS Bronze Layer • Dynamic ETL • GitOps-Controlled • Portfolio-Ready Azure Project
- 🔥 Project Highlights
- 🚀 Overview
- 🧠 Architecture
- 📸 Screenshots
- 🏗 Pipelines Explained
- 🗂 Folder Structure
- 🧪 How to Run
- 🧰 Tools & Skills
- 📈 Resume Value
- 📬 Contact
This project demonstrates real-world Azure Data Engineering ingestion patterns, including:
- 🔹 Multi-source ingestion: GitHub Raw + ADLS
- 🔹 Two independent dynamic pipelines
- GitToBronze — GitHub → Bronze
- DatalakeIngestion — ADLS → Bronze
- 🔹 Parameterised datasets for scalable ETL
- 🔹 GitOps-managed ADF artifacts
- 🔹 Bronze zone design aligned with Data Lakehouse principles
- 🔹 Beginner-friendly yet enterprise-style
- 🔹 Clickable screenshots & architecture diagrams
- 🔹 Fully documented for portfolio & recruiters
SEO Keywords:
Azure Data Engineer, ADF Pipeline, ADLS Gen2, Data Lake, GitOps, ETL Project, Azure Portfolio, Source to Bronze, Data Ingestion, Dynamic Pipelines, Cloud Engineering
This project builds a clean, scalable Source → Bronze ingestion framework.
- GitHub Raw Folder
- WHR datasets:
WHR_2015.csv→WHR_2023.csv
- WHR datasets:
- ADLS Source Container
nocs.csv
- ADLS Bronze Layer
| Pipeline | Source | Destination | Purpose |
|---|---|---|---|
| GitToBronze | GitHub HTTP | ADLS Bronze | Ingest World Happiness Records datasets [2015-2023] |
| DatalakeIngestion | ADLS Source | ADLS Bronze | Recursively ingest ADLS files |
Both pipelines together form a multi-source ingestion layer suitable for real companies.
Purpose:
Ingest year-wise WHR datasets from GitHub → ADLS Bronze.
Steps:
- Lookup JSON from GitHub Raw
- ForEach over file list
- Activity:
HTTP Source → ADLS Sink (Parquet/CSV)
Use case: Easily scalable ingestion for large Git-based data repositories.
Purpose:
Scan an ADLS folder & ingest files into Bronze.
Steps:
- Get Metadata (recursive)
- ForEach over items
- IfCondition (match specific file names)
- Activity:
ADLS Source → ADLS Bronze
Use case: Automating ingestion of new drops in a source folder.
README.md
│
├── adf
│ ├── datasets
│ ├── linkedServices
│ └── pipelines
│
├── docs
│ └── screenshots
│ ├── logo.png
│ ├── architecture_diagram.png
│ ├── git_ingestion.png
│ └── metadriven_datalake_ingestion.png
│
├── metadata
│ ├── tables.json
│ └── schema
│ ├── nocs.json
│ ├── 2015.json
│ ├── 2016.json
│ ├── 2017.json
│ ├── 2018.json
│ ├── 2019.json
│ ├── 2020.json
│ ├── 2021.json
│ ├── 2022.json
│ └── 2023.json
│
└── source_data
├── adls
│ └── nocs.csv
│
└── github
├── WHR_2015.csv
├── WHR_2016.csv
├── WHR_2017.csv
├── WHR_2018.csv
├── WHR_2019.csv
├── WHR_2020.csv
├── WHR_2021.csv
├── WHR_2022.csv
└── WHR_2023.csv
Import JSON files inside:
/adf/pipelines /adf/datasets /adf/linkedServices
- ADLS linked service → authentication
- HTTP linked service → GitHub Raw URL access
Upload:
nocs.csv→ ADLS source- Year-wise WHR files → GitHub Raw (OR keep your existing structure)
Run:
- GitToBronze → to ingest GitHub files
- DatalakeIngestion → to ingest ADLS files
Check:
adls/bronze/
All files should appear there.
- Azure Data Factory
- ADLS Gen2
- GitHub GitOps
- HTTP Linked Services
- Dynamic ADF pipeline design
- Multi-source ingestion
- Data Lake Bronze zone design
- GitOps artifact management
- ETL orchestration
- Recursive metadata extraction
- Parameterized dataset development
-
Cloud-native ingestion architecture using Azure Data Factory
-
GitOps-based workflow for ADF JSON artifacts
-
Multi-source data ingestion (GitHub Raw + ADLS)
-
Scalable Bronze zone landing pattern on ADLS Gen2
-
Dynamic pipelines using parameters and ForEach loops
-
Real-world folder structuring, orchestration, and documentation
If you’d like enhancements:
- Silver & Gold layer
- Power BI reporting layer
- CI/CD pipelines (Azure DevOps or GitHub Actions)
- ADF code refactoring
- Metadata-driven rewrite
Just ask!



