Skip to content

AmeeJoshi-MCA/Azure-Data-Engineering-Framework-Dynamic-Ingestion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Azure Azure Data Factory Data Engineering CI/CD REST API Status

🚀 Azure Data Factory Pro – Enterprise Data Ingestion Framework

This repository represents a modular Azure Data Factory ingestion framework, where each module addresses a real-world data engineering scenario commonly found in enterprise data platforms.


📌 Project Overview

This project showcases a production-grade Azure Data Engineering solution built using Azure Data Factory (ADF). Instead of focusing on a single pipeline, the project is designed as a collection of reusable ingestion patterns, reflecting how real enterprise data platforms are built and maintained.

Each module solves a specific data engineering problem such as incremental database loads, API ingestion, schema variability, monitoring, and CI/CD.


🎯 Why a Modular Design?

  • In real-world organizations:

  • Data comes from multiple source types (databases, APIs, files)

  • Each source has different ingestion challenges

  • Pipelines must be scalable, reusable, and easy to maintain

  • Monitoring and deployment are platform-level responsibilities

👉 This project is intentionally modular to demonstrate:

  • Pattern-based engineering

  • Platform thinking over one-off pipelines

  • How Data Engineers design enterprise-ready ingestion frameworks


📦 Core Modules & Real-World Scenarios

🔹 Incremental SQL Data Load (Delta Strategy)

  • Scenario: Transactional databases where only new or updated records should be processed.

  • Timestamp-based watermark (last_updated)

  • Handles late-arriving / backdated data

  • Avoids full-table reloads

  • Cost-efficient and scalable design

    image image

🔹 REST API Ingestion with Dynamic Pagination

  • Scenario: Third-party APIs that return data in pages.

  • Range-based pagination

  • Automatically retrieves all available records

  • Scales without manual looping

  • API rate-limit aware design

    image

🔹 Metadata-Driven File Ingestion

  • Scenario: Regular file drops from multiple teams or vendors.

  • Single dynamic dataset

  • Auto-discovery of files

  • ForEach + Switch-based routing

  • Eliminates pipeline and dataset sprawl

image

🔹 Dynamic Schema Mapping

  • Scenario: Multiple business entities with different schemas but identical ingestion flow.

  • Schema mappings passed as JSON parameters

  • Runtime schema selection

  • One pipeline supports multiple structures

  • Prevents schema-related pipeline failures

    image

🔹 Monitoring & Failure Alerts

  • Scenario: Production pipelines requiring immediate operational visibility.

  • Azure Logic App integration

  • Automated email alerts

  • Captures pipeline context (run ID, table, pipeline name)

  • Covers silent and skipped failures

image

🔹 CI/CD with GitHub Integration

  • Scenario: Multiple engineers working on shared data pipelines.

  • Git-based version control

  • Feature branching & pull requests

  • ARM & YAML artifact generation

  • Safe deployments and rollback capability


🧠 Skills Demonstrated

Module Problem Solved Key Tech
Incremental Loads Optimized delta ingestion ADF, SQL
API Pagination Scalably ingest third-party APIs ADF, REST
Metadata-Driven Files Multi-file ingestion ADF, Params
Schema Mapping Dynamic schema support ADF, JSON Mappings
Monitoring Alerts & operational readiness ADF + Logic Apps
CI/CD GitOps workflow GitHub + ADF

🏁 Conclusion

This project establishes a high-maturity Metadata-Driven Data Engineering Framework on Azure, transitioning from static ETL tasks to enterprise-grade orchestration. By decoupling logic from data and implementing a self-cleaning, event-driven architecture, the platform achieves:

  • Scalability & Reusability: Leveraged Parameterization and Dynamic Mapping to handle multi-entity ingestion (Customers, Drivers, Trips) through a single code path, minimizing technical debt.

  • Cost & Resource Optimization: Integrated Watermark Patterns for Incremental Loading (Delta Loads) and Logical Gating, ensuring Azure Consumption is limited only to changed datasets.

  • Operational Resilience: Implemented automated Data Validation, REST API Pagination logic, and Logic App Webhooks for real-time monitoring and proactive error alerting.

  • DataOps & CI/CD Excellence: Developed a robust Software Development Lifecycle (SDLC) using GitHub Version Control, Feature Branching, and automated ARM/YAML Template generation for seamless multi-environment deployment.

This framework represents a modern, Production-Ready approach to building sustainable, cost-effective, and Metadata-Driven data platforms in the cloud.

About

End-to-end Metadata-Driven Data Engineering framework built on Azure. Features dynamic SQL/REST API ingestion with range pagination, automated schema mapping, and event-driven orchestration. Implements robust CI/CD via GitHub Actions/YAML and automated failure alerting with Logic Apps. Optimized for scalability and DE best practices.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors