Skip to content

Causal Inference

jcanny edited this page May 13, 2014 · 32 revisions

IPTW

BIDMach has several basic causal estimators. IPTW stands for Inverse Probability of Treatment Weighting and is a widely used method technique for causal inference with binary treatments. We start with some data features <math>X</math>, a response <math>Y</math>, and a "treatment" <math>Z</math>. In causal inference we are interested in the effects of directly changing <math>Z</math> on <math>Y</math>. This is different from the conditional probability of <math>Y</math> given <math>Z</math> which depends on joint probability distribution of the system "as is". The causal effect instead models a change to the system where we force <math>Z</math> to a new value. The simplest approach is to regress <math>Y</math> on <math>X,Z</math>, using linear or logistic regression. The coefficient of <math>Z</math> in the regression model captures the direct influence of <math>Z</math> on <math>Y</math>. If the regression model is exact, i.e. if <math>Y</math> really is a linear function of the inputs, then this coefficient accurately captures the influence of <math>Z</math> on <math>Y</math> (for logistic regression we need to use instead the influence of the regression coefficient, which is <math>L(1,X)-L(0,X)</math> for a particular input X, where <math>L(Z,X)</math> is the logistic predictor).

However, a regression model often wont be exact, and a different kind of estimate is needed. The next approach is to simulate randomization of <math>Z</math>. If we had randomly assigned each user to classes <math>Z=0</math> and <math>Z=1</math> we could simply use the difference in responses as the causal effect. This is the approach taken in randomized trials. But given a dataset, we can't change the assignments to <math>Z</math> that were made. The actual assignment could depend in an arbitrary fashion on the other features <math>X</math> and then the difference in response will depend on the influence of those features as well as <math>Z</math>. We could partition or cluster the data according to <math>X</math>, and then look within each subset at the difference in outcome for samples with <math>Z=0</math> and <math>Z=1</math>. But this is very difficult with high-dimensional features, and there turns out to be a much more efficient approach based on the *propensity score*. The propensity score is the probability of the treatment assignment, i.e. <math>P(Z|X)</math>, and the notations <math>g_0(X)=P(Z=0|X)</math> and <math>g_1(X)=P(Z=1|X)</math> are often used. Stratification by the propensity score turns out to be sufficient for accurate estimation of the causal effect.

For a regression-based estimate, we can treat the propensity score as a sampling bias. By dividing by it, we create a pseudo-sample in which units have equal (and random) probability of assignment to <math>Z=0</math> or <math>Z=1</math>. Then the IPTW estimator for the causal effect is defined as:

<math>E(Y|Z=1)-E(Y|Z=0) = \frac{1}{n}\sum_{i=1}^n \left( \frac{Z_i Y_i}{g_1(X)} - \frac{(1-Z_i)Y_i}{g_0(X)}\right)</math>

A-IPTW

and A-IPTW is an augmented form of IPTW which guards against errors

Clone this wiki locally