This project is an end-to-end data engineering pipeline that ingests, processes, and analyzes economic data from the FRED API.
Data is stored using the medallion architecture (Bronze, Silver, Gold) in Databricks Lakehouse with PySpark and Delta Lake.
A Logistic Regression model trained with MLflow predicts US recession probability with 92.5% accuracy.
Results are visualized via Power BI and/ or a Databricks Dashboard, accessible through this Streamlit app.
The pipeline runs automatically on the 1st Tuesday of every month, with email alerts for data quality issues and model drift detection.
FRED API
-> Bronze Layer (indicators, observations, ingestion_log, dq_log)
-> Silver Layer (wide_macro_indicators)
-> Gold Layer (annual_macro_summary, recession_periods, recession_predictions, monitoring_log)
-> ML Model (Logistic Regression, MLflow tracking)
-> Monitoring (drift detection, email alerts, retrain trigger)
-> Visualization (Power BI, Databricks Dashboard, Streamlit)
| Category | Tools |
|---|---|
| Cloud & Storage | Databricks, Delta Lake, Unity Catalog |
| Processing | PySpark, Apache Spark |
| ML & Tracking | Scikit-Learn, MLflow |
| Orchestration | Databricks Workflows |
| Data Source | FRED API |
| Visualization | Power BI, Databricks Dashboard |
| App | Streamlit |
| CI/CD | GitHub Actions |
The datasource for this project is from the open source FRED API, which is actively maintained by the St Louis Federal Reserve. Here, we use 8 of the core indicators, namely:
| Sl No | Indicator | Description | Frequency |
|---|---|---|---|
| 1 | FEDFUNDS | Federal Funds Interest Rate | Monthly |
| 2 | CPIAUCSL | Consumer Price Index (Inflation) | Monthly |
| 3 | MORTGAGE30US | 30-Year Fixed Mortgage Rate | Weekly |
| 4 | UNRATE | Unemployment Rate | Monthly |
| 5 | DGS10 | 10-Year Treasury Yield | Daily |
| 6 | GDP | Gross Domestic Product | Quarterly |
| 7 | M2SL | M2 Money Supply | Monthly |
| 8 | USREC | US Recession Indicator | Monthly |
The pipeline is orchestrated using Databricks Jobs & Pipelines and runs automatically on the 1st Tuesday of every month
| Task | Description |
|---|---|
| Bronze Ingestion | Pulls 8 FRED indicators via API, stores raw data in Delta tables with audit logging and DQ checks |
| Silver Transformation | Pivots observations into a wide table, adds recession flags and quarter start indicators |
| Gold Analytics | Computes annual summaries and recession periods |
| ML Inference | Runs recession probability predictions using saved MLflow model |
| Monitoring | Detects drift, sends email alerts, triggers retraining if needed |
A Logistic Regression model that is trained on 10 features which predicts monthly US recession probability
| Detail | Value |
|---|---|
| Algorithm | Logistic Regression (PySpark MLlib) |
| Features | FEDFUNDS, DGS10, UNRATE, MORTGAGE30US, yield_spread, M2SL, CPI + 3-month lags |
| Class Weight | 7.6x for recession months |
| AUC | 0.925 |
| Accuracy | 92.5% |
| Recall | 92.5% |
| Precision | 96.5% |
| Tracking | MLflow |
Monitoring is setup to detect data drifts and data quality. An email is sent if either of the 2 takes place using the smtplib python library.
- Clone the repo
- Get a free FRED API key at fred.stlouisfed.org
- Set up a Databricks workspace and create the Unity Catalog
us_macroeconomics_tracker - Add your secrets to
.streamlit/secrets.toml - Run the notebooks in order:
US_Macroeconomics_Tracker->Recession_Predictor->Gold_Model_Monitoring - Add the Dashboard file to your workspace and publish it
- Follow the instructions inside the other folders
- Run the app:
streamlit run app/app.py
- October 2023 saw the highest predicted recession probability (32%) coinciding with the 10-year Treasury yield hitting 5% for the first time since 2007
- 2020 COVID recession was correctly predicted with high probability spikes in March-April 2020
- 2008 Financial Crisis showed sustained high recession probabilities from late 2007 through 2009
- Current recession risk (Jan 2026) is very low at 0.33%, suggesting stable economic conditions
- 92.5% model accuracy with only 48 misclassifications out of 654 historical months
- Add more economic indicators (PCE, housing starts, consumer confidence)
- Experiment with ensemble models (Random Forest, XGBoost) and Hyperparameter Tuning
- Build a REST API layer using FastAPI to create a more interactive web application
- Add real-time streaming with Apache Kafka













