Skip to content

Causal Inference

jcanny edited this page May 13, 2014 · 32 revisions

IPTW

BIDMach has several basic causal estimators. IPTW stands for Inverse Probability of Treatment Weighting and is a widely used method technique for causal inference with binary treatments. We start with some data features <math>X</math>, a response <math>Y</math>, and a "treatment" <math>Z</math>. In causal inference we are interested in the effects of directly changing <math>Z</math> on <math>Y</math>. This is different from the conditional probability of <math>Y</math> given <math>Z</math> which depends on joint probability distribution of the system "as is". The causal effect instead models a change to the system where we force <math>Z</math> to a new value. The simplest approach is to regress <math>Y</math> on <math>X,Z</math>, using linear or logistic regression. The coefficient of <math>Z</math> in the regression model captures the direct influence of <math>Z</math> on <math>Y</math>. If the regression model is exact, i.e. if <math>Y</math> really is a linear function of the inputs, then this coefficient accurately captures the influence of <math>Z</math> on <math>Y</math> (for logistic regression we need to use instead the influence of the regression coefficient, which is <math>L(1,X)-L(0,X)</math> for a particular input X, where <math>L(Z,X)</math> is the logistic predictor).

However, a regression model often wont be exact, and a different kind of estimate is needed. The next approach is to simulate randomization of <math>Z</math>. If we had randomly assigned each user to classes <math>Z=0</math> and <math>Z=1</math> we could simply use the difference in responses as the causal effect. This is the approach taken in randomized trials. But given a dataset, we can't change the assignments to <math>Z</math> that were made. The actual assignment could depend in an arbitrary fashion on the other features <math>X</math> and then the difference in response will depend on the influence of those features as well as <math>Z</math>. We could partition or cluster the data according to <math>X</math>, and then look within each subset at the difference in outcome for samples with <math>Z=0</math> and <math>Z=1</math>. But this is very difficult with high-dimensional features, and there turns out to be a much more efficient approach based on the *propensity score*. The propensity score is the probability of the treatment assignment, i.e. <math>P(Z|X)</math>, and the notations <math>g_0(X)=P(Z=0|X)</math> and <math>g_1(X)=P(Z=1|X)</math> are often used. Stratification by the propensity score turns out to be sufficient for accurate estimation of the causal effect.

For a regression-based estimate, we can treat the propensity score as a sampling bias. By dividing by it, we create a pseudo-sample in which units have equal (and random) probability of assignment to <math>Z=0</math> or <math>Z=1</math>. Then the IPTW estimator for the causal effect is defined as:

<math>E(Y_1)-E(Y_0) = \frac{1}{n}\sum\limits_{i=1}^n \left( \frac{Z_i Y_i}{g_1(X_i)} - \frac{(1-Z_i)Y_i}{g_0(X_i)}\right)</math>

where <math>Y_0</math> and <math>Y_1</math> are the responses <math>Y</math> when <math>Z</math> is forced respectively to 0 or 1. There is a simple estimator for IPTW effects in the causal package. It concurrently computes <math>P(Z|X)</math> using logistic regression and the estimator above. It expects a targets option which encodes both the treatment(s) <math>Z</math> and the effects <math>Y</math>. It can analyze k effects at once, and targets should be a 2k x nfeats matrix. The first k rows encode the k treatments, and the next k rows encode the corresponding effects. The treatment and effects should therefore be among the input features, and the <math>j^{th}</math> row of targets will normally be zeros with a single 1 in the position of the feature that encodes the <math>j^{th}</math> treatment. The <math>(j+k)^{th}</math> row should have a single 1 in the position that encodes the feature for the <math>j^{th}</math> effect.

The output is in the learner's modelmats(1) field. This will be an FMat with k entries corresponding to the k effects.

A-IPTW

Regression estimates will be inaccurate if the outcome <math>Y</math> is not accurately modeled as a linear or logistic function of <math>X</math>. IPTW estimates can be quite inaccurate if the model used for <math>P(Z|X)</math> is poor. Doubly-Robust estimators combine regression and propensity score estimators in such a way that if either estimator is accurate, the errors from the other estimator cancel out. In other words, a doubly robust estimator is accurate if either the base regression model or the propensity score are accurate. One class of these estimators are called A-IPTW for Augmented IPTW.

<math>E(Y_1)-E(Y_0) = \frac{1}{n}\sum\limits_{i=1}^n \left( \frac{Z_i Y_i}{g_1(X_i)} - \frac{(1-Z_i)Y_i}{g_0(X_i)}- \left(Z_i-g_1(X_i)\right)\left( \frac{m_1(X_i)}{g_1(X_i)} + \frac{m_0(X_i)}{g_0(X_i)}\right)\right)</math>

where <math>m_0(X_i)</math> and <math>m_1(X_i)</math> are derived from a regression model <math>R(X,Z)</math> for <math>Y</math> given <math>X,Z</math>. <math>m_0(X_i)= R(X_i,0)</math> and <math>m_1(X_i)=R(X_i,1)</math>.

Clone this wiki locally