Pinned Loading
-
-
-
nyc-taxi-analytics
nyc-taxi-analytics PublicHigh-performance SQL analytics on NYC TLC Yellow Taxi parquet files using DuckDB, no warehouse needed.
Python
-
hacker-news-data-lake
hacker-news-data-lake PublicBronze/Silver/Gold data lake on the Hacker News Firebase API, orchestrated by Airflow with MinIO and partitioned Parquet. Async httpx ingestion, ExternalTaskSensor-gated DAG dependencies, DuckDB-bu…
Python
-
kafka-streaming-pipeline
kafka-streaming-pipeline PublicReal-time e-commerce events: Python producer -> Kafka KRaft -> Spark Structured Streaming -> Postgres -> Streamlit dashboard. Windowed aggregations, anomaly detection, full docker-compose stack.
Python
-
open-data-etl
open-data-etl PublicBatch ETL pipeline ingesting French open-data DVF (real-estate transactions) into a Parquet star schema with DuckDB views. Polars streaming, idempotent download, partitioned warehouse.
Python
If the problem persists, check the GitHub status page or contact support.


