GRU Forecasting Pipeline for Ad Creative Performance

📖 Overview

This repository implements a production-grade machine learning pipeline to forecast the effectiveness of digital advertising creatives. It utilizes a Gated Recurrent Unit (GRU) network—a specialized type of Recurrent Neural Network (RNN)—to predict lead generation (leads) by modeling temporal dependencies in historical performance data.

Unlike standard regression models, the GRU architecture is specifically designed to capture long-term sequential patterns and dependencies in time-series data, making it a strong alternative to Convolutional (TCN) approaches.

🚀 Key Features

Hybrid Data Ingestion: Merges time-series performance metrics (CSV) with static creative attributes parsed from complex, nested JSON files.
Automated Feature Engineering:
- Extracts NLP features (e.g., keyword presence, question length) from ad copy.
- Engineers rolling-window statistics (7-day/14-day means) for trend capture.
Bayesian Hyperparameter Tuning: Utilizes Optuna to optimize RNN architecture (Hidden Dimensions, Stacked Layers, Dropout) with cross-validation.
Robust Evaluation: Compares deep learning forecasts against a custom "Realistic Baseline" (Efficiency = Target/Spend) to ensure true model lift.
Production Safety: Implements strict temporal splitting to prevent data leakage and handles missing data with forward-filling logic.

🛠️ Technical Architecture

1. Data Extraction (`sql/`)

The raw performance data is extracted from the company's data warehouse using the scripts located in the sql/ directory:

data_extraction_train.sql: Extracting historical data for model training (applying business logic filters).
data_extraction_inf.sql: Extracting recent data for daily inference batches.

2. Data Processing (`data_processing.py`)

JSON Parsing: Flattens nested JSON hierarchies to extract creative metadata (e.g., button_style, mood, colors).
Cleaning: Handles messy real-world data, including outlier removal and NaN imputation for "cold start" ads.

3. Modeling Strategy (`modeling.py`)

Model: GRU (Gated Recurrent Unit) implemented via the Darts library (RNNModel).
Objective: Minimize Mean Absolute Error (MAE) on the validation set.
Search Space:
- hidden_dim: 16 - 64 (Size of the internal memory state)
- n_rnn_layers: 1 - 3 (Depth of the network)
- dropout: 0.1 - 0.4 (Regularization)

4. Evaluation (`utils.py`)

The pipeline evaluates models using a suite of regression and business metrics:

MAE / RMSE: For raw error magnitude.
sMAPE: For relative error handling zero-values.
Bias (ME): To detect systematic over/under-forecasting.

📦 Installation

Clone the repository:

git clone [https://github.com/yourusername/gru-forecasting-pipeline.git](https://github.com/yourusername/gru-forecasting-pipeline.git)
cd gru-forecasting-pipeline

Install dependencies:
```
pip install -r requirements.txt
```

⚙️ Usage

Configuration: Update config.py with your local data paths:

PERFORMANCE_CSV_PATH = "./data/database_full.csv"
CREATIVE_JSON_DIR = "./data/jsons/"

Run the Pipeline: Execute the main orchestration script:
```
python main.py
```
This will ingest data, run feature engineering, optimize the GRU model using Optuna, and output performance metrics.
Run Inference: Generate forecasts for your ads:
```
python inference.py
```

📊 Visualizations

The pipeline automatically generates diagnostic plots in the results/ directory:

Forecast vs. Actuals: Time-series overlay of predictions.
Residual Analysis: Analysis of error distribution over time.

Author

Luciën Tuijp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GRU Forecasting Pipeline for Ad Creative Performance

📖 Overview

🚀 Key Features

🛠️ Technical Architecture

1. Data Extraction (`sql/`)

2. Data Processing (`data_processing.py`)

3. Modeling Strategy (`modeling.py`)

4. Evaluation (`utils.py`)

📦 Installation

⚙️ Usage

📊 Visualizations

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
sql		sql
README.md		README.md
config.py		config.py
data_processing.py		data_processing.py
inference.py		inference.py
main.py		main.py
modeling.py		modeling.py
requirements.txt		requirements.txt
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

GRU Forecasting Pipeline for Ad Creative Performance

📖 Overview

🚀 Key Features

🛠️ Technical Architecture

1. Data Extraction (sql/)

2. Data Processing (data_processing.py)

3. Modeling Strategy (modeling.py)

4. Evaluation (utils.py)

📦 Installation

⚙️ Usage

📊 Visualizations

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Data Extraction (`sql/`)

2. Data Processing (`data_processing.py`)

3. Modeling Strategy (`modeling.py`)

4. Evaluation (`utils.py`)

Packages