Targeted Maximum Likelihood (TMLE) for Causal Inference

Targeted Maximum Likelihood (TMLE) for Causal InferenceA double-robust automated way for causal inferenceYao YangBlockedUnblockFollowFollowingMay 17What is causal inference?Imagine you have two drugs, A and B, for treating cancer.

You test the drugs on different people, now you want to measure which is more effective, how are you going to test it?You will probably first do randomization of the patients, and give half of them drug A, and half drug B, test key metrics over a period of time to compare which drug lowers more of the key metrics.

Is that enough?No.

A common misunderstanding people have is that as long as randomization is fulfilled, we can do infer that different treatments cause different results.

This is not right.

Randomization cannot guarantee equivalence.

And this is where propensity score matching comes to place.

What is propensity score matching then?The idea behind propensity score matching is that by giving each individual in the study a propensity score, we can compare individuals in different treatment groups and try to make the individuals as equivalent as possible so that we can control the confounding factors, the different result would be from the treatment only.

Now you see that propensity score matching requires proper experiment setup before drugs are used on individuals in the study.

We may end up dropping a lot of individual because they don’t have matching counterparts.

What if I failed to do the experiments or the number of matched pairs is too little?.Can we still use the data we collected?Let’s still think of the previous case where we have two groups getting two different drugs A and B.

Our aim is to measure Average Treatment Effect (ATE=E[Y(A=1)]−E[Y(A=0)]) in order to compare the treatments.

We call patient’s data point X — it may contain features like patients’ age, gender, occupation, income, exercise, smoke or not, etc; test metrics Y — the metrics we care about to test for treatment result; treatment A being a binary variable 0 and 1 for two different treatments.

How do we estimate unbiased Y(A=1) and Y(A=0) to accurately measure treatment effects with the least confounding variables’ influence?Here I’ll introduce a state-of-art method targeted maximum likelihood estimation (TMLE).

Targeted Learning is proposed by van der Laan & Rubin in 2006 [1] as an automated (as opposed to do-it-yourself) causal inference method.

TMLE is used to analyze censored observational data from a non-controlled experiment in a way that allows effect estimation even in the presence of confounding factors.

Here’s a step-by-step guide of how TMLE works:Step 1: Generate an initial estimate of E(Y|A, X).

This is what we call g-computation in causal inference, it is a maximum-likelihood-based substitution estimator, it relies on the estimation of the conditional expectation of the outcome given the exposure and covariance.

This estimator is used to generate the potential outcome Y₁ and Y₀, corresponding to A=1 and A=0.

As to how do we get the estimator, Mark in paper used “super learner”, which is essentially an ensemble learning method.

I’ll talk more about G-computation and the ensemble method in future posts.

Now we have the estimator, remember that this method is conditioned on the treatment, and just assumes that the other variables have biases on the estimator that can be ignored.

But this is not robust enough —we need to ‘target’ at the treatment variable and minimize variable bias in a more robust way.

Step 2.

Estimate the exposure mechanism using propensity score P(A=1|X).

We calculate the conditional probability of being exposed given the observed cofounders X.

This is the same propensity score we mentioned earlier that people use to do matching of individuals.

In TMLE, besides the propensity score, we also calculate π₁= P(A=1|X) and π₀ = 1-π₁ for each individual of the study.

Step 3.

Update initial estimate of E(Y|A, X).

In the first step, we calculated Y₀ and Y₁ through g-computation, now we need to update it to reduce the bias of confounding variables.

We introduce H₀(A=a, X) = frac{I(A=1}{π₁} — frac{I(A=0}{π₀}.

Let’s understand this more — for individual, we calculate H₁=frac{1}{π₁} and H₀=frac{1}{π₀}.

With each individual’s Y, H₁, and H₀, we can fit a linear model assuming the intercept is constant.

logit(E∗(Y|A, X))=logit(Y_a)+????×Ha, Here the ????.is a fluctuation parameter consists of two values (????1,????0) for the model.

With ????ˆ, we can generate logit(Y₁)=logit(Y₁)+ ????1×H₁ and logit(Y₀)=logit(Y₀)+ ????0×H₀Step 4.

Generate targeted estimate of the target parameter.

With the new Y₁ and Y₀ for each individual we get from the last step, we can calculate the targeted estimate ATE=1.∑|Y₁-Y₀|.

I will discuss more how TMLE compares to other causal inference methods, as well as how to implement TMLE in my next posts.

Till the next time!References:[1] Van der Laan & Rubin, Targeted Maximum Likelihood Learning, 2006.. More details