This repository contains the programmatic implementation of my (in progress) undergraduate thesis research for MMSS at Northwestern University. The objective of this research is to compare the performance of Bayesian and frequentist (grid and random) hyperparameter tuning methods for X-learner models (from EconML) in the context of heterogeneous treatment effect estimation.
Causal ML combines machine learning algorithms with econometric identification strategies to estimate treatment effects; heterogeneous treatment effect estimation is particularly valuable in observational settings where the efficacy of intervention regimes benefits from personalization. Despite promising advancements in causal ML, empirical applications remain limited by challenges that arise from both the causal side and the algorithmic sides, especially the question of model configuration and tuning. In particular, hyperparameter tuning is frequently ad hoc or neglected in causal applications, undermining model performance and broader empirical claims on treatment effects.
This thesis seeks to address the issue of hyperparameter tuning in the development of causal ML models, asking whether a Bayesian approach to hyperparameter tuning can deliver better performance than the standard frequentist-style automatic tuning methods. The analysis focuses on causal ML methods for estimating CATE, evaluating how such tuning strategies affect the credibility of estimated treatment effects, conceptualizing the problem of tuning from two opposing statistical philosophies.
All main scripts are stored in the src module. Corresponding pytest modules for some of these scripts can be found in the tests folder.
bayesian-tuning-metalearners/
├── src/
│ ├── dgp.py
│ └── xlearner.py
│ └── metrics_helpers.py
│ └── tuning.py
│ └── convergence.py
│ └── experiment.py
│ └── main.py
│ └── experiment_configs/
│ ├── config_1d.py
│ └── config_2d.py
│ └── config_4d.py
│ └── config_6d.py
├── pyproject.toml
└── README.md
src/
dgp.py contains the SimulatedDataset class and the data generating function simulate_dataset() used in these simulations. The DGP is based on the procedure in Künzel et al. (2019) with the following modifications:
- Confounding, prognostic, and effect modifier covariates are specified.
- Correlation matrix is constructed from a randomly generated eigenvector.
- Individual response functions (and thus the true treatment effect) and propensity score function are original.
- Observed outcomes
Y0,Y1constructed from their respective response functionsmu0,mu1but share the same normally-distributed error term to control noise.
xlearner.py contains the implentation of the X-learner model used in this study, the XLearnerWrapper class. This object inherits BaseEstimator from sklearn.base and wraps the XLearner object from EconML's metalearners library.
metrics_helpers.py contains the helper functions used in cross-validation and model evaluation, calculating observed outcome MSE, PEHE/PEHEplug, and TAUplug.
tuning.py contains all tuning implementations, grid_search(), random_search(), and bayesian_search(). All tuning functions use the same internal cross-validation process, differing only in their search algorithms. Each returns the fitted model, parameters, best score achieved after tuning, and a history of scores and parameters at each iteration.
convergence.py contains the ConvergenceTracker class, which is used to track the convergence of the tuning process. It stores the best score and parameters at each iteration, and can be used to generate convergence plots.
experiment.py implements one Monte-Carlo simulation using R repetitions. The experimental workflow is as follows:
- Given a data-generating function, base learner configuration, tuner configuration, and R value:
- For each Monte-Carlo repetition 1 through R, simulate a training and test dataset using the same DGP parameters
dgp_paramsand a different random seed. Then construct anXlearnerwith parameters given by the base learner setup,learner_config.- For each tuner configuration in
tuners, tune an XLearner model on the training data. Then, use the test data to estimate CATEtau_hatand cross-predicttau_plug, and calculate PEHE and PEHE plugin values. - Store the learner-tuner combination and its resulting PEHE and PEHE plugin values in
raw_results, and generate a summary table containing the mean and variance of both PEHE metrics across Monte-Carlo simulations.
- For each tuner configuration in
- For each Monte-Carlo repetition 1 through R, simulate a training and test dataset using the same DGP parameters
main.py conducts the entire pipeline, running all specified experiments, and storing their results in an output directory with the following format:
bayesian-tuning-metalearners/
├── results_R_{R value}/
│ ├── x_cb/
│ ├── 1d/
│ └── raw_results.csv
│ └── summary.csv
│ └── convergence/
│ ├── random/
│ │ └── convergence_R{r}.csv
| ...
│ └── bayes/
│ └── convergence_R{r}.csv
| ...
│ ├── 2d/ ...
│ ├── 4d/ ...
│ ├── 6d/ ...
│ └── x_rf/
│ ├── 1d/ ...
│ ├── 2d/ ...
│ ├── 4d/ ...
| ├── 6d/ ...
This module requires python version 3.10 or above and uses poetry for dependency management. To run the experiments on your local machine, do the following:
First make a local clone of this repository. Then activate the virtual environment using:
source .venv/bin/activate
Next install required dependencies:
poetry install
To run the experiment pipeline, run:
python -m src.main
Note that the experiment configurations can be altered. To change these configurations, edit or create a new configuration file in src/experiment_configs and update the filepath in the first section of main.py. Happy experimentation :)
Notes
- Random & Bayesian Search:
n_iterset to 10 * d for d-dimensional search space
- Base learners:
catboost_infogeneration
- XLearner
modelsvspropensity_modelvscate_models