Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
-
Updated
Jan 21, 2020 - Scala
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
AI-powered data sanitizer with schema detection, dedupe, outlier detection, and LLM enrichment.
Real Time Data Streaming Pipeline
Collecting highlights from the Quix community and social media in the form of interesting questions, comments, challenges, solutions and insights
This project implements a modern data engineering pipeline using Databricks, PySpark, DBT, and Delta Live Tables. It follows the Medallion Architecture, supports realtime data ingestion with Autoloader, and models data with fact and dimension tables, including Slowly Changing Dimensions (SCD Type 2), all orchestrated in a scalable cloud environment
Stream data directly from an API using Apache Beam to BigQuery.
Ownership-aware reactive streaming runtime on the WebAssembly Component Model
Docs-only case study of a compliance & anomaly detection platform on Azure + Databricks (Streaming ETL + Batch ELT + ML).
Collecting highlights from the Quix community and social media in the form of interesting questions, comments, challenges, solutions and insights
Streaming pipeline using AWS MSK and AWS EMR with Spark, retrieving the data from Twitter Streams API
Contract-first Azure streaming data product built with Event Hubs, Spark Structured Streaming, and Delta Lake. Designed for determinism, auditability, and idempotent metrics computation.
Kafka-based real-time cryptocurrency data ingestion pipeline with Python and MongoDB
Masters degree | Data Engineering | Final course projects | goit-de-fp
Docs-only case study – Compliance Reporting data platform on Azure for a Big-4 Audit & Consulting Firm (BFSI, healthcare-style datasets) using Streaming Pipeline (ETL) + Batch Pipeline (ELT) with Snowflake, Synapse, ADF, Power BI, ML risk scoring, DQ, governance, and lineage.
Data Engineer Training Using Google Cloud Platform
End-to-end aircraft risk detection pipeline using Kafka, Spark Structured Streaming, and ML
Add a description, image, and links to the streaming-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the streaming-pipeline topic, visit your repo's landing page and select "manage topics."