For the pdf slides, click here
Notations and Terminologies
Notations
We are interested in the causal effect of some treatment \(A\) on some outcome \(Y\)
Treatment: \(A\), binary
\(A=1\) if receive treatment; and \(A=0\) if receive control
Example: \(A=1\) if receive active drug; and \(A=0\) if receive placebo
Outcome: \(Y\), can be binary or continuous
- Example: \(Y=1\) if dead; \(Y=0\) otherwise
- Example: \(Y\) can be time until death
Pre-treatment covariates: \(X\)
Potential outcomes and conterfactuals
Potential outcomes
Potential outcome \(Y^a\) is the outcome we would see if treatment was set to \(A=a\)
Each person has potential outcome \(Y^0, Y^1\)
Counterfactuals
Conterfactual outcomes: the outcomes would have been observed, had the treatment been different
If my treatment is \(A=1\), then my counterfactual outcomes is \(Y^0\)
If my treatment is \(A=0\), then my counterfactual outcomes is \(Y^1\)
Connection between potential and conterfactuals outcomes
- Before the treatment decision is made, any outcome is a potential outcome, \(Y^0\) and \(Y^1\)
- After the study, there is an observed outcome \(Y^A\), and counterfactual outcome \(Y^{1-A}\)
Immutable variables
Variables that we cannot control (or change), such as race, gender, age, are immutable variables
For immutable variables, causal effects are not well defined
In this course, we focus on treatments that could be thought of as interventions
What are Causal Effects?
Causal effects
Definition: treatment \(A\) has a causal effect on the outcome \(Y\), if \(Y^1\) differs from \(Y^0\)
- Example
- \(Y\): headache gone one hour from now (yes\(=1\), no\(=0\))
- \(A\): take ibuprofen (\(A=1\)) or not (\(A=0\))
Fundamental problem of causal inference
Fundamental problem of causal inference
Fundamental problem of causal inference: we can only observe one potential outcome for each person
However, with certain assumptions, we can estimate population level (average) causal effects \(E(Y^1 - Y^0)\)
- Average value of \(Y\) if everyone was treated with \(A=1\) minus average value of \(Y\) if everyone was treated with \(A=0\)
- Headache example:
- Hopeless: What would have happened to me had I not taken ibuprofen? (Unit level causal effect)
- Hopeful: What would the rate of headache remission be if everyone took ibuprofen when they had a headache versus if no one did? (Population level causal effect)
Visualization of population average causal effect
Population average causal effect versus conditioning on treatment/control
\[ E(Y^1 - Y^0) \neq E(Y\mid A=1) - E(Y\mid A=0) \]
In the left hand side, \(E(Y^1)\) is the mean of \(Y\) if the whole population was treated with \(A=1\)
In the right hand side, \(E(Y\mid A=1)\) is restricting to the subpopulation of people who actually had \(A=1\)
- This subpopulation may differ from the whole population in important ways
- For example, people at higher risk for flu are more likely to choose to get a flu shot
\(E(Y\mid A=1) - E(Y\mid A=0)\) is not a causal effect, because it is comparing two different populations of people
Visualization of real world
Other causal effects
\(E(Y^1 / Y^0)\): causal relative risk
\(E(Y^1 - Y^0 \mid A=1)\): causal effect of treatment on the treated
\(E(Y^1 - Y^0 \mid V=v)\): average causal effect in the subpopulation with covariate \(V=v\)
Visualization of causal effect of treatment on the treated
Causal Assumptions
Most common causal assumptions
Stable Unit Treatment Value Assumption (SUTVA)
Consistency
Ignorability
Positivity
Stable Unit Treatment Value Assumption (SUTVA)
Stable Unit Treatment Value Assumption (SUTVA)
- SUTVA involves two assumptions
No interference
- Units do not interfere with each other
- Treatment assignement of one unit does not affect that outcome of another unit
- Spillover or contagion are also terms for interference
One version of treatment
- SUTVA allows us to write potential outcome for a person in terms of only that person’s treatments
Consistency
Consistency assumption
- Consistency assumption: the potential outcome under treatment \(A=a\), \(Y^a\), is equal to the observed outcome if the actual treatment received is \(A=a\)
\[ Y=Y^a \text{ if } A=a, \text{ for all } a \]
Ignorability
Ignorability assumption
Ignorability assumption: given pre-treatment covariates \(X\), treatment assignment is independent from the potential outcomes \[ Y^0, Y^1 \perp A \mid X \]
Among people with the same values of \(X\), we can think of treatment \(A\) as being randomly assigned
Example: \(Y^0\) and \(Y^1\) are not independent from \(A\) marginally, but within levels of \(X\), treatment might be randomly assigned
- \(X\): age; can take values ‘younger’ or ‘older’
- \(Y\): hip fracture
- Older people are more likely to get treatment \(A=1\)
- Older people are also more likely to have the outcome, regardless of treatment
Positivity
Positivity assumption
Positivity assumption: for every set of values of \(X\), treatment assignment was not deterministic \[ P(A=a \mid X=x) > 0, \text{ for all } a \text{ and } x \]
If for some values of \(X\), treatment was deterministic, then we would have no observed values of \(Y\) for one of the treatment groups for those values of \(X\)
Standardization and Stratification
Observed data and potential outcomes
- Under all above assumptions, the observed data average outcome \(E(Y\mid A=a, X=x)\) equals the potential outcomes \(E(Y^a \mid X=x)\)
\[\begin{align*} E(Y\mid A=a, X=x) &= E(Y^a\mid A=a, X=x) \text{ by consistency}\\ &= E(Y^a\mid X=x) \text{ by ignorability}\\ \end{align*}\]
- If we want a marginal causal effect, we can average over \(X\) \[ E(Y^a) = \sum_x E(Y \mid A=a, X=x) P(X=x) \]
Standardization
Standardization involves stratifying and then averaging
- First obtain the mean treatment effect within each stratum \(E(Y \mid A=a, X=x)\)
- Then pool across stratum, weighing by the probability (size) of each stratum \(P(X=x)\)
Standardization example: two diabetes treatments
Treatments: saxagliptin (new medicine) vs sitagliptin
Outcome: major adverse cardiac event (MACE)
Covariate: past use of oral antidiabetic (OAD) drug
Challenge
- Saxa users were more likely to have past use of OAD drug
- Patients with past use of OAD drugs are at higher risk of MACE
Stratify parents in two subpopulations by whether having prior OAD use
- Within levels of the prior OAD use variable, treatment can be thought of as randomized (ignorability)
Example continued: unstratified
MACE=yes | MACE=no | Total | |
---|---|---|---|
Saxa=yes | 350 | 3650 | 4000 |
Saxa=no | 500 | 6500 | 7000 |
Total | 850 | 10150 | 11000 |
\[\begin{align*} &P(\text{MACE} \mid \text{Saxa}=\text{yes}) = 350/4000 = 0.088\\ &P(\text{MACE} \mid \text{Saxa}=\text{no}) = 500/7000 = 0.071 \end{align*}\]
Example continued: subpopulation without prior OAD use
MACE=yes | MACE=no | Total | |
---|---|---|---|
Saxa=yes | 50 | 950 | 1000 |
Saxa=no | 200 | 3800 | 4000 |
Total | 250 | 4750 | 5000 |
\[\begin{align*} &P(\text{MACE} \mid \text{Saxa}=\text{yes}) = 50/1000 = 0.05\\ &P(\text{MACE} \mid \text{Saxa}=\text{no}) = 200/4000 = 0.05 \end{align*}\]
Example continued: subpopulation with prior OAD use
MACE=yes | MACE=no | Total | |
---|---|---|---|
Saxa=yes | 300 | 2700 | 3000 |
Saxa=no | 300 | 2700 | 3000 |
Total | 600 | 5400 | 6000 |
\[\begin{align*} &P(\text{MACE} \mid \text{Saxa}=\text{yes}) = 300/3000 = 0.10\\ &P(\text{MACE} \mid \text{Saxa}=\text{no}) = 300/3000 = 0.10 \end{align*}\]
Example continued: mean potential outcome for Saxa
\[\begin{align*} & E(Y^{\text{saxa}})\\ = ~& E(Y\mid A=\text{saxa}, X = \text{prior OAD use yes}) P(\text{prior OAD use yes})\\ & + E(Y\mid A=\text{saxa}, X = \text{prior OAD use no}) P(\text{prior OAD use no})\\ = & (300/3000) (6000/11000) + (50/1000) (5000/11000) \\ = & 0.077 \end{align*}\]
Similarly, \(E(Y^{\text{sita}}) = 0.077\)
Hence, the treatment Saxa or not has no causal effects on the MACE outcome
Problems with standardization
There will be many \(X\) variables needed to achieve ignorability
Stratification would lead to many empy cells
Alternative to standardization: matching inverse probability of treatment weighting (IPTW), etc
References
Coursera class: “A Crash Course on Causality: Inferring Causal Effects from Observational Data”, by Jason A. Roy (University of Pennsylvania)