-
Notifications
You must be signed in to change notification settings - Fork 170
Causal Inference
BIDMach has several basic causal estimators. IPTW stands for Inverse Probability of Treatment Weighting and is a widely used method technique for causal inference with binary treatments. We start with some data features <math>X</math>, a response <math>Y</math>, and a "treatment" <math>Z</math>. In causal inference we are interested in the effects of directly changing <math>Z</math> on <math>Y</math>. This is different from the conditional probability of <math>Y</math> given <math>Z</math> which depends on joint probability distribution of the system "as is". The causal effect instead models a change to the system where we force <math>Z</math> to a new value. The simplest approach is to regress <math>Y</math> on <math>X,Z</math>, using linear or logistic regression. The coefficient of <math>Z</math> in the regression model captures the direct influence of <math>Z</math> on <math>Y</math>. If the regression model is exact, i.e. if <math>Y</math> really is a linear function of the inputs, then this coefficient accurately captures the influence of <math>Z</math> on <math>Y</math> (for logistic regression we need to use instead the influence of the regression coefficient, which is <math>L(1,X)-L(0,X)</math> for a particular input X, where <math>L(Z,X)</math> is the logistic predictor).
However, a regression model often wont be exact, and a different kind of estimate is needed. The next approach is to simulate randomization of <math>Z</math>. If we had randomly assigned each user to classes <math>Z=0</math> and <math>Z=1</math> we could simply use the difference in responses as the causal effect. This is the approach taken in randomized trials. But given a dataset, we can't change the assignments to <math>Z</math> that were made. The actual assignment could depend in an arbitrary fashion on the other features <math>X</math> and then the difference in response will depend on the influence of those features as well as <math>Z</math>. We could partition or cluster the data according to <math>X</math>, and then look within each subset at the difference in outcome for samples with <math>Z=0</math> and <math>Z=1</math>. But this is very difficult with high-dimensional features, and there turns out to be a much more efficient approach based on the *propensity score*. The propensity score is the probability of the treatment assignment, i.e. <math>P(Z|X)</math>, and the notations <math>g_0(X)=P(Z=0|X)</math> and <math>g_1(X)=P(Z=1|X)</math> are often used. Stratification by the propensity score turns out to be sufficient for accurate estimation of the causal effect.
For a regression-based estimate, we can treat the propensity score as a sampling bias. By dividing by it, we create a pseudo-sample in which units have equal (and random) probability of assignment to <math>Z=0</math> or <math>Z=1</math>. Then the IPTW estimator for the causal effect is defined as:
<math>E(Y|Z=1)-E(Y|Z=0) = \frac{1}{n}\sum\limits_{i=1}^n \left( \frac{Z_i Y_i}{g_1(X)} - \frac{(1-Z_i)Y_i}{g_0(X)}\right)</math>
There is a simple estimator for IPTW effects in the causal package. It concurrently computes <math>P(Z|X)</math> using logistic regression and the estimator above. It expects a targmap option which encodes both the treatment(s) <math>Z</math> and the effects <math>Y</math>. It can analyze k effects at once, and targmap should be a 2k x nfeats matrix. The first k rows encode the k treatments, and the next k rows encode the corresponding effects. The treatment and effects should therefore be among the input features, and the <math>j^{th}</math> row of targmap will normally be zeros with a single 1 in the position of the feature that encodes the <math>j^{th}</math> treatment. The <math>(j+k)^{th}</math> row should have a single 1 in the position that encodes the feature for the <math>j^{th}</math> effect.
The output is in the learner's modelmats(1) field. This will be an FMat with k entries corresponding to the k effects.
and A-IPTW is an augmented form of IPTW which guards against errors