# Notations and Terminologies

### Notations

• We are interested in the causal effect of some treatment $$A$$ on some outcome $$Y$$

• Treatment: $$A$$, binary

• $$A=1$$ if receive treatment; and $$A=0$$ if receive control

• Example: $$A=1$$ if receive active drug; and $$A=0$$ if receive placebo

• Outcome: $$Y$$, can be binary or continuous

• Example: $$Y=1$$ if dead; $$Y=0$$ otherwise
• Example: $$Y$$ can be time until death
• Pre-treatment covariates: $$X$$

## Potential outcomes and conterfactuals

### Potential outcomes

• Potential outcome $$Y^a$$ is the outcome we would see if treatment was set to $$A=a$$

• Each person has potential outcome $$Y^0, Y^1$$

### Counterfactuals

• Conterfactual outcomes: the outcomes would have been observed, had the treatment been different

• If my treatment is $$A=1$$, then my counterfactual outcomes is $$Y^0$$

• If my treatment is $$A=0$$, then my counterfactual outcomes is $$Y^1$$

• Connection between potential and conterfactuals outcomes

• Before the treatment decision is made, any outcome is a potential outcome, $$Y^0$$ and $$Y^1$$
• After the study, there is an observed outcome $$Y^A$$, and counterfactual outcome $$Y^{1-A}$$

### Immutable variables

• Variables that we cannot control (or change), such as race, gender, age, are immutable variables

• For immutable variables, causal effects are not well defined

• In this course, we focus on treatments that could be thought of as interventions

# What are Causal Effects?

### Causal effects

• Definition: treatment $$A$$ has a causal effect on the outcome $$Y$$, if $$Y^1$$ differs from $$Y^0$$

• Example
• $$Y$$: headache gone one hour from now (yes$$=1$$, no$$=0$$)
• $$A$$: take ibuprofen ($$A=1$$) or not ($$A=0$$)

## Fundamental problem of causal inference

### Fundamental problem of causal inference

• Fundamental problem of causal inference: we can only observe one potential outcome for each person

• However, with certain assumptions, we can estimate population level (average) causal effects $$E(Y^1 - Y^0)$$

• Average value of $$Y$$ if everyone was treated with $$A=1$$ minus average value of $$Y$$ if everyone was treated with $$A=0$$
• Hopeless: What would have happened to me had I not taken ibuprofen? (Unit level causal effect)
• Hopeful: What would the rate of headache remission be if everyone took ibuprofen when they had a headache versus if no one did? (Population level causal effect)

### Population average causal effect versus conditioning on treatment/control

$E(Y^1 - Y^0) \neq E(Y\mid A=1) - E(Y\mid A=0)$

• In the left hand side, $$E(Y^1)$$ is the mean of $$Y$$ if the whole population was treated with $$A=1$$

• In the right hand side, $$E(Y\mid A=1)$$ is restricting to the subpopulation of people who actually had $$A=1$$

• This subpopulation may differ from the whole population in important ways
• For example, people at higher risk for flu are more likely to choose to get a flu shot
• $$E(Y\mid A=1) - E(Y\mid A=0)$$ is not a causal effect, because it is comparing two different populations of people

### Other causal effects

• $$E(Y^1 / Y^0)$$: causal relative risk

• $$E(Y^1 - Y^0 \mid A=1)$$: causal effect of treatment on the treated

• $$E(Y^1 - Y^0 \mid V=v)$$: average causal effect in the subpopulation with covariate $$V=v$$

# Causal Assumptions

### Most common causal assumptions

• Stable Unit Treatment Value Assumption (SUTVA)

• Consistency

• Ignorability

• Positivity

## Stable Unit Treatment Value Assumption (SUTVA)

### Stable Unit Treatment Value Assumption (SUTVA)

• SUTVA involves two assumptions
1. No interference

• Units do not interfere with each other
• Treatment assignement of one unit does not affect that outcome of another unit
• Spillover or contagion are also terms for interference
2. One version of treatment

• SUTVA allows us to write potential outcome for a person in terms of only that person’s treatments

## Consistency

### Consistency assumption

• Consistency assumption: the potential outcome under treatment $$A=a$$, $$Y^a$$, is equal to the observed outcome if the actual treatment received is $$A=a$$

$Y=Y^a \text{ if } A=a, \text{ for all } a$

## Ignorability

### Ignorability assumption

• Ignorability assumption: given pre-treatment covariates $$X$$, treatment assignment is independent from the potential outcomes $Y^0, Y^1 \perp A \mid X$

• Among people with the same values of $$X$$, we can think of treatment $$A$$ as being randomly assigned

• Example: $$Y^0$$ and $$Y^1$$ are not independent from $$A$$ marginally, but within levels of $$X$$, treatment might be randomly assigned

• $$X$$: age; can take values ‘younger’ or ‘older’
• $$Y$$: hip fracture
• Older people are more likely to get treatment $$A=1$$
• Older people are also more likely to have the outcome, regardless of treatment

## Positivity

### Positivity assumption

• Positivity assumption: for every set of values of $$X$$, treatment assignment was not deterministic $P(A=a \mid X=x) > 0, \text{ for all } a \text{ and } x$

• If for some values of $$X$$, treatment was deterministic, then we would have no observed values of $$Y$$ for one of the treatment groups for those values of $$X$$

# Standardization and Stratification

### Observed data and potential outcomes

• Under all above assumptions, the observed data average outcome $$E(Y\mid A=a, X=x)$$ equals the potential outcomes $$E(Y^a \mid X=x)$$

\begin{align*} E(Y\mid A=a, X=x) &= E(Y^a\mid A=a, X=x) \text{ by consistency}\\ &= E(Y^a\mid X=x) \text{ by ignorability}\\ \end{align*}

• If we want a marginal causal effect, we can average over $$X$$ $E(Y^a) = \sum_x E(Y \mid A=a, X=x) P(X=x)$

### Standardization

• Standardization involves stratifying and then averaging

• First obtain the mean treatment effect within each stratum $$E(Y \mid A=a, X=x)$$
• Then pool across stratum, weighing by the probability (size) of each stratum $$P(X=x)$$

### Standardization example: two diabetes treatments

• Treatments: saxagliptin (new medicine) vs sitagliptin

• Outcome: major adverse cardiac event (MACE)

• Covariate: past use of oral antidiabetic (OAD) drug

• Challenge

• Saxa users were more likely to have past use of OAD drug
• Patients with past use of OAD drugs are at higher risk of MACE
• Stratify parents in two subpopulations by whether having prior OAD use

• Within levels of the prior OAD use variable, treatment can be thought of as randomized (ignorability)

### Example continued: unstratified

Unstratified raw data
MACE=yes MACE=no Total
Saxa=yes 350 3650 4000
Saxa=no 500 6500 7000
Total 850 10150 11000

\begin{align*} &P(\text{MACE} \mid \text{Saxa}=\text{yes}) = 350/4000 = 0.088\\ &P(\text{MACE} \mid \text{Saxa}=\text{no}) = 500/7000 = 0.071 \end{align*}

### Example continued: subpopulation without prior OAD use

MACE=yes MACE=no Total
Saxa=yes 50 950 1000
Saxa=no 200 3800 4000
Total 250 4750 5000

\begin{align*} &P(\text{MACE} \mid \text{Saxa}=\text{yes}) = 50/1000 = 0.05\\ &P(\text{MACE} \mid \text{Saxa}=\text{no}) = 200/4000 = 0.05 \end{align*}

### Example continued: subpopulation with prior OAD use

MACE=yes MACE=no Total
Saxa=yes 300 2700 3000
Saxa=no 300 2700 3000
Total 600 5400 6000

\begin{align*} &P(\text{MACE} \mid \text{Saxa}=\text{yes}) = 300/3000 = 0.10\\ &P(\text{MACE} \mid \text{Saxa}=\text{no}) = 300/3000 = 0.10 \end{align*}

### Example continued: mean potential outcome for Saxa

\begin{align*} & E(Y^{\text{saxa}})\\ = ~& E(Y\mid A=\text{saxa}, X = \text{prior OAD use yes}) P(\text{prior OAD use yes})\\ & + E(Y\mid A=\text{saxa}, X = \text{prior OAD use no}) P(\text{prior OAD use no})\\ = & (300/3000) (6000/11000) + (50/1000) (5000/11000) \\ = & 0.077 \end{align*}

• Similarly, $$E(Y^{\text{sita}}) = 0.077$$

• Hence, the treatment Saxa or not has no causal effects on the MACE outcome

### Problems with standardization

• There will be many $$X$$ variables needed to achieve ignorability

• Stratification would lead to many empy cells

• Alternative to standardization: matching inverse probability of treatment weighting (IPTW), etc

### References

• Coursera class: “A Crash Course on Causality: Inferring Causal Effects from Observational Data”, by Jason A. Roy (University of Pennsylvania)