Skip to content

Valentyn1997/OAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overlap-Adaptive Regularization

Tackling the issue of low overlap in meta-learners with adaptive regularization.

Conference arXiv

image

The project is built with the following Python libraries:

  1. PyTorch
  2. Hydra - simplified command line arguments management
  3. MlFlow - experiments tracking

Setup

Installations

First one needs to make the virtual environment and install all the requirements:

pip3 install virtualenv
python3 -m virtualenv -p python3 --always-copy venv
source venv/bin/activate
pip3 install -r requirements.txt

MlFlow Setup / Connection

To start an experiments server, run:

mlflow server --port=5000 --gunicorn-opts "--timeout 280"

To access the MlFLow web UI with all the experiments, connect via ssh:

ssh -N -f -L localhost:5000:localhost:5000 <username>@<server-link>

Then, one can go to the local browser http://localhost:5000.

Semi-synthetic datasets setup

Before running semi-synthetic experiments, place datasets in the corresponding folders:

 ── data/acic_2016
    ├── synth_outcomes
    |   ├── zymu_<id0>.csv   
    |   ├── ... 
    │   └── zymu_<id14>.csv 
    ├── ids.csv
    └── x.csv 

Experiments

The main training script is universal for different methods and datasets. For details on mandatory arguments, see the main configuration file config/config.yaml and other files in config/ folder.

Generic script with logging and fixed random seed is the following:

PYTHONPATH=.  python3 runnables/train.py +dataset=<dataset> +model=<model> exp.seed=10

Datasets

One needs to specify a dataset / dataset generator (and some additional parameters, e.g. train size for the synthetic data dataset.n_samples_train=250, or a subset index for ACIC 2016 data dataset.dataset_ix=0):

Models

Models already have the best hyperparameters saved, for each model - dataset pair. One can access them via: +<dataset>_hparams=<dataset> or +<dataset>_hparams=<dataset_ix> etc.

Stage 1.

Stage 1 models are propensity networks (src/models/prop_nets.py) and outcome networks (src/models/mu_nets.py). To perform manual hyperparameter tuning, use the flags prop_net_cov.tune_hparams=True and mu_net_cov.tune_hparams=True.

Stage 2.

Stage 2 models are defined in config/config.yaml and src/models/target_model.py. One needs to specify a second-stage model (target net +model=target_net or target kernel ridge regression +model=target_krr) and specific parameters of the regularization:

  • target_net.regularization.adaptive: False - constant regularization, True - overlap-adaptive regularization (OAR)
  • target_net.regularization.type: noise - noise regularization (for +model=target_net), dropout - dropout (for +model=target_net), l2 - RKHS norm (for +model=target_krr)
  • target_net.regularization.coeff: mult - multiplicative regularization function ($\lambda_{\mathrm{m}}$), mult - logarithmic regularization function ($\lambda_{\log}$), mult2 - squared multiplicative regularization function ($\lambda_{\mathrm{m}^2}$)
  • target_net.regularization.efficient: False - OAR, True - dOAR
  • target_net.regularization.base_value: constant value of regularization $\lambda/p$ for rescaling

Examples

Example of running target net with OAR($\lambda_{\mathrm{m}}$) noise regularization w/o hyper-parameter tuning, rescaled to $\lambda = 0.5$, based on synthetic data:

CUDA_VISIBLE_DEVICES=<devices> PYTHONPATH=. python3 runnables/train.py -m +dataset=synthetic +model=target_net +synthetic_hparams=\'250\' exp.logging=True exp.device=cuda exp.seed=10 target_net.regularization.adaptive=True target_net.regularization.type=noise target_net.regularization.coeff=mult target_net.regularization.efficient=False target_net.regularization.base_value=0.5

Example of running target net with dOAR($\lambda_{\log}$) dropout w/ hyper-parameter tuning, rescaled to $\lambda = 0.1$, based on the IHDP dataset (first subset):

CUDA_VISIBLE_DEVICES=<devices> PYTHONPATH=. python3 runnables/train.py -m +dataset=ihdp100 +model=target_net +ihdp100_hparams=ihdp100 exp.logging=True exp.device=cuda exp.seed=10 dataset.dataset_ix=0 prop_net_cov.tune_hparams=True mu_net_cov.tune_hparams=True target_net.regularization.adaptive=True target_net.regularization.type=dropout target_net.regularization.coeff=log target_net.regularization.efficient=True target_net.regularization.base_value=0.1 

About

Code for the paper "Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages