Skip to content

lucien150/gru-forecasting-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GRU Forecasting Pipeline for Ad Creative Performance

📖 Overview

This repository implements a production-grade machine learning pipeline to forecast the effectiveness of digital advertising creatives. It utilizes a Gated Recurrent Unit (GRU) network—a specialized type of Recurrent Neural Network (RNN)—to predict lead generation (leads) by modeling temporal dependencies in historical performance data.

Unlike standard regression models, the GRU architecture is specifically designed to capture long-term sequential patterns and dependencies in time-series data, making it a strong alternative to Convolutional (TCN) approaches.

🚀 Key Features

  • Hybrid Data Ingestion: Merges time-series performance metrics (CSV) with static creative attributes parsed from complex, nested JSON files.
  • Automated Feature Engineering:
    • Extracts NLP features (e.g., keyword presence, question length) from ad copy.
    • Engineers rolling-window statistics (7-day/14-day means) for trend capture.
  • Bayesian Hyperparameter Tuning: Utilizes Optuna to optimize RNN architecture (Hidden Dimensions, Stacked Layers, Dropout) with cross-validation.
  • Robust Evaluation: Compares deep learning forecasts against a custom "Realistic Baseline" (Efficiency = Target/Spend) to ensure true model lift.
  • Production Safety: Implements strict temporal splitting to prevent data leakage and handles missing data with forward-filling logic.

🛠️ Technical Architecture

1. Data Extraction (sql/)

The raw performance data is extracted from the company's data warehouse using the scripts located in the sql/ directory:

  • data_extraction_train.sql: Extracting historical data for model training (applying business logic filters).
  • data_extraction_inf.sql: Extracting recent data for daily inference batches.

2. Data Processing (data_processing.py)

  • JSON Parsing: Flattens nested JSON hierarchies to extract creative metadata (e.g., button_style, mood, colors).
  • Cleaning: Handles messy real-world data, including outlier removal and NaN imputation for "cold start" ads.

3. Modeling Strategy (modeling.py)

  • Model: GRU (Gated Recurrent Unit) implemented via the Darts library (RNNModel).
  • Objective: Minimize Mean Absolute Error (MAE) on the validation set.
  • Search Space:
    • hidden_dim: 16 - 64 (Size of the internal memory state)
    • n_rnn_layers: 1 - 3 (Depth of the network)
    • dropout: 0.1 - 0.4 (Regularization)

4. Evaluation (utils.py)

The pipeline evaluates models using a suite of regression and business metrics:

  • MAE / RMSE: For raw error magnitude.
  • sMAPE: For relative error handling zero-values.
  • Bias (ME): To detect systematic over/under-forecasting.

📦 Installation

  1. Clone the repository:

    git clone [https://github.com/yourusername/gru-forecasting-pipeline.git](https://github.com/yourusername/gru-forecasting-pipeline.git)
    cd gru-forecasting-pipeline
  2. Install dependencies:

    pip install -r requirements.txt

⚙️ Usage

  1. Configuration: Update config.py with your local data paths:

    PERFORMANCE_CSV_PATH = "./data/database_full.csv"
    CREATIVE_JSON_DIR = "./data/jsons/"
  2. Run the Pipeline: Execute the main orchestration script:

    python main.py

    This will ingest data, run feature engineering, optimize the GRU model using Optuna, and output performance metrics.

  3. Run Inference: Generate forecasts for your ads:

    python inference.py

📊 Visualizations

The pipeline automatically generates diagnostic plots in the results/ directory:

  • Forecast vs. Actuals: Time-series overlay of predictions.
  • Residual Analysis: Analysis of error distribution over time.

Author

Luciën Tuijp

About

Sequential forecasting pipeline leveraging Gated Recurrent Units (GRU) to model long-term dependencies in ad performance data. A deep learning baseline comparing RNN architectures against TCNs for temporal dynamics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors