G-computation in Causal InferenceA counterfactual method for causal inferenceYao YangBlockedUnblockFollowFollowingJun 9In my previous post ‘Targeted Maximum Likelihood (TMLE) for Causal Inference’, I mentioned that TMLE would outperform G-computation, as Van der Laan & Rubin pointed in their paper, but I didn’t elaborate how g-computation works.
G-computation algorithm was first introduced by Robins in 1986  to estimate the causal effect of a time-varying exposure in the presence of time-varying confounders that are affected by exposure, a scenario where traditional regression-based methods would fail.
G-computation or G-formula belongs to the G-method family  which also includes inverse probability weighted marginal structural models and g estimation of a structural nested model.
They provide consistent estimates of contrasts (e.
differences, ratios) of average potential outcomes under a less restrictive set of identification conditions than standard regression methods.
In this post, I’ll explain more in details of how G-computation works in causal analysis.
ExampleThink of an example — treating HIV, the way to measure the effect of treatment is to test for CD4 count, the more the number of CD4 counts in your blood, the better the treatment effect is.
We name the outcome (CD4 count) Y.
We have two groups of patients, A=1 means receiving a specific treatment, A=0 means otherwise.
We test the patients before any treatment as a baseline and also do a follow-up test A_1.
We also have a covariate Z — elevated HIV viral load, which is constant by design at baseline and measured once during follow up just prior to the second treatment (Z_1).
Besides these, we also have an unmeasured common cause (U) of HIV viral load(Z) and CD4(Y).
To measure the difference, we usually use average causal effect E(Y1-Y0), it is a marginal effect because it averages over all individual-level effects in the population.
How do we identify the changes in CD4 count (Y) is caused by the treatment (A=1), not something else?AssumptionsCausal diagram representing the relation between a treatment at time 0 (A0), HIV viral load just prior to the second round of treatment (Z1), the treatment status at time 1 (A1), the CD4 count measured at the end of follow-up (Y), and an unmeasured common cause (U) of HIV viral load and CD4.
We use an average effect to measure the difference, however, we cannot acquire all information because some individuals are exposed to treatment and some are not.
We are not doing propensity score matching so we cannot guarantee that all the other conditions the individuals have are the same.
We need to justify the average effect we take for measuring would be observed in the whole population.
This is accomplished by making the following assumptions:Assumption1: Counterfactual consistencyThe consistency rule states that a person’s potential outcome under a hypothetical condition that happened to materialize is precisely the outcome experienced by that person.
This allows us to write: P(Yx = y|Z = z,X = x) = P(Y = y|Z = z,X = x), and identify our average causal effectAssumption2: exchangeabilityExchangeability implies that potential outcomes under exposures are independent of actual exposures A_0 or A_1.
This is the assumption that says “The data came from a randomized controlled trial”.
If this assumption is true, you will observe a random subset of the distribution of Ya=0 in the group where A=0, and a random subset of the distribution of Ya=1 in the group where A=1.
Assumption3: positivityPositivity is the assumption that any individual has a positive probability of receiving all values of the treatment variable.
This assumption is useful so that the effects would exist.
The assumption would be met when there are exposed and unexposed individuals within all confounders.
Under these assumptions, g methods can be used to estimate counterfactual quantities with observational dataSteps for G-computation:To estimate different component effects of treatment on CD4 count, we implemented the following steps:3 steps for G-computationThe idea is to use counterfactuals (what the result would be if the patients who received treatments did not receive treatments, and the patients who did not receive treatments did) to create an estimation of average effect to fit into our model of measuring the difference.
Thanks for reading my posts! In later posts, I’ll explain the other G-method, namely IPWC, and explain how TMLE, G-formula, and IPWC are different and related.
Give me claps if you like the topic/posts, it will make me motivated to write more:)Till the next time! Robins J.
A new approach to causal inference in mortality studies with a sustained exposure period — application to control of the healthy worker survivor effect.
 Robins J and Hernan M.
Estimation of the causal effects of time-varying exposures.
In: Fitzmaurice G, Davidian M, Verbeke G, and Molenberghs G (Eds.
) Advances in Longitudinal Data Analysis.
Boca Raton, FL: Chapman & Hall.