Skip to content

Leela-Passion/GitOps-and-ADLS-ingestion-into-Bronze-Pipelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GitOps-Driven Source → Bronze Ingestion Using Azure Data Factory

Multi-source ingestion (GitHub + ADLS) → ADLS Bronze Layer • Dynamic ETL • GitOps-Controlled • Portfolio-Ready Azure Project


📋 Table of Contents


🔥 Project Highlights

This project demonstrates real-world Azure Data Engineering ingestion patterns, including:

  • 🔹 Multi-source ingestion: GitHub Raw + ADLS
  • 🔹 Two independent dynamic pipelines
    • GitToBronze — GitHub → Bronze
    • DatalakeIngestion — ADLS → Bronze
  • 🔹 Parameterised datasets for scalable ETL
  • 🔹 GitOps-managed ADF artifacts
  • 🔹 Bronze zone design aligned with Data Lakehouse principles
  • 🔹 Beginner-friendly yet enterprise-style
  • 🔹 Clickable screenshots & architecture diagrams
  • 🔹 Fully documented for portfolio & recruiters

SEO Keywords:
Azure Data Engineer, ADF Pipeline, ADLS Gen2, Data Lake, GitOps, ETL Project, Azure Portfolio, Source to Bronze, Data Ingestion, Dynamic Pipelines, Cloud Engineering


🚀 Overview

This project builds a clean, scalable Source → Bronze ingestion framework.

Data Sources

  • GitHub Raw Folder
    • WHR datasets: WHR_2015.csvWHR_2023.csv
  • ADLS Source Container
    • nocs.csv

Destination

  • ADLS Bronze Layer

Pipelines

Pipeline Source Destination Purpose
GitToBronze GitHub HTTP ADLS Bronze Ingest World Happiness Records datasets [2015-2023]
DatalakeIngestion ADLS Source ADLS Bronze Recursively ingest ADLS files

Both pipelines together form a multi-source ingestion layer suitable for real companies.


🧠 Architecture

📌 Architecture Diagram

Architecture Diagram


📸 Screenshots

🔷 GitHub → ADF Integration

Git Ingestion


🔷 Data Lake Ingestion Pipeline (ADF)

Data Lake Ingestion


🏗 Pipelines Explained

1️⃣ GitToBronze Pipeline

Purpose:
Ingest year-wise WHR datasets from GitHub → ADLS Bronze.

Steps:

  • Lookup JSON from GitHub Raw
  • ForEach over file list
  • Activity:
    HTTP Source → ADLS Sink (Parquet/CSV)

Use case: Easily scalable ingestion for large Git-based data repositories.


2️⃣ DatalakeIngestion Pipeline

Purpose:
Scan an ADLS folder & ingest files into Bronze.

Steps:

  • Get Metadata (recursive)
  • ForEach over items
  • IfCondition (match specific file names)
  • Activity:
    ADLS Source → ADLS Bronze

Use case: Automating ingestion of new drops in a source folder.


🗂 Folder Structure

README.md
│
├── adf
│   ├── datasets
│   ├── linkedServices
│   └── pipelines
│
├── docs
│   └── screenshots
│       ├── logo.png
│       ├── architecture_diagram.png
│       ├── git_ingestion.png
│       └── metadriven_datalake_ingestion.png
│
├── metadata
│   ├── tables.json
│   └── schema
│       ├── nocs.json
│       ├── 2015.json
│       ├── 2016.json
│       ├── 2017.json
│       ├── 2018.json
│       ├── 2019.json
│       ├── 2020.json
│       ├── 2021.json
│       ├── 2022.json
│       └── 2023.json
│
└── source_data
    ├── adls
    │   └── nocs.csv
    │
    └── github
        ├── WHR_2015.csv
        ├── WHR_2016.csv
        ├── WHR_2017.csv
        ├── WHR_2018.csv
        ├── WHR_2019.csv
        ├── WHR_2020.csv
        ├── WHR_2021.csv
        ├── WHR_2022.csv
        └── WHR_2023.csv


🧪 How to Run

1. Import Pipelines

Import JSON files inside:

/adf/pipelines /adf/datasets /adf/linkedServices

2. Fix Linked Services

  • ADLS linked service → authentication
  • HTTP linked service → GitHub Raw URL access

3. Upload Data

Upload:

  • nocs.csv → ADLS source
  • Year-wise WHR files → GitHub Raw (OR keep your existing structure)

4. Run Pipelines

Run:

  • GitToBronze → to ingest GitHub files
  • DatalakeIngestion → to ingest ADLS files

5. Validate Output

Check:

adls/bronze/

All files should appear there.


🧰 Tools & Skills

Technologies

  • Azure Data Factory
  • ADLS Gen2
  • GitHub GitOps
  • HTTP Linked Services

Skills Demonstrated

  • Dynamic ADF pipeline design
  • Multi-source ingestion
  • Data Lake Bronze zone design
  • GitOps artifact management
  • ETL orchestration
  • Recursive metadata extraction
  • Parameterized dataset development

📈 Portfolio Value

  • Cloud-native ingestion architecture using Azure Data Factory

  • GitOps-based workflow for ADF JSON artifacts

  • Multi-source data ingestion (GitHub Raw + ADLS)

  • Scalable Bronze zone landing pattern on ADLS Gen2

  • Dynamic pipelines using parameters and ForEach loops

  • Real-world folder structuring, orchestration, and documentation


📬 Contact

If you’d like enhancements:

  • Silver & Gold layer
  • Power BI reporting layer
  • CI/CD pipelines (Azure DevOps or GitHub Actions)
  • ADF code refactoring
  • Metadata-driven rewrite

Just ask!

Releases

No releases published

Packages

 
 
 

Contributors