Data pipeline that integrates Gaia DR3 and SDSS DR13 catalogs using ADQL and Python to model and forecast interstellar extinction and generate large-scale sky maps.
This project demonstrates SQL-based data extraction, ETL workflows, feature engineering, supervised regression modeling, and visualization of astrophysical data at scale.
The goal of this project is to predict interstellar extinction using astrometric and photometric measurements from Gaia, cross-matched with SDSS extinction measurements.
The pipeline includes:
- ADQL query to extract cross-matched Gaia and SDSS sources
- Data extraction using astroquery TAP interface
- ETL workflow: raw data extraction and transformation
- Feature engineering using astrometric quality and photometric parameters
- Supervised machine learning regression model
- Statistically calibrated confidence intervals estimation
- Visualization of predicted extinction
- Gaia DR3 — astrometry and photometry
- SDSS DR13 — extinction measurements (
extinction_u) - Cross-match:
gaiadr3.sdssdr13_best_neighbour
Access via ESA Gaia Archive TAP service.