*For the pdf slides, click here*

# Introduction to Instrumental Variables

### Unmeasured confounding

- Suppose there are unobserved variables \(U\) that affect both \(A\) and \(Y\), then \(U\) is an unmeasured confounding

This violates ignorability assumption

Since we cannot control for the unobserved confounders \(U\) and average over its distribution, if using matching or IPTW methods, the estimates of causal effects is biased

Solution: instrumental variables

### Instrumental variables

- Instrumental variables (IV): an alternative causal inference method that does not rely on the ignorability assumption

\(Z\) is an IV

- It affects treatment \(A\), but does not directly affect the outcome \(Y\)
- We can think of \(Z\) as encouragement (of treatement)

### Example of an encouragement design

- \(A\): smoking during pregnancy (yes/no)
- \(Y\): birth weight
\(X\): mother’s age, weight, etc

- Concern: there could be unmeasured confounders
- Challenge: it is not ethical to randomly assign smoking

\(Z\): randomized to either received encouragement to stop smoking (\(Z=1\)) or receive usual care (\(Z=0\))

- Causal effect of encouragement, also called intent-to-treat (ITT) effect, may be of some interest \[E\left(Y^{Z=1}\right)-E\left(Y^{Z=0}\right)\]
- Focus of IV methods is still causal effect of the treatment \[E\left(Y^{A=1}\right)-E\left(Y^{A=0}\right)\]

### IV is randomized

Like the previous smoking example, sometimes IV is randomly assigned as part of the study

Other times IV is

**believed**to be randomized in nature (natural experiment). For example,- Mendelian randomization (?)
- Quarter of birth
- Geographic distance to specialty care provider

## Randomized trials with noncompliance

### Randomized trials with noncompliance

- Setup
- \(Z\):
**randomization**to treatment (1 treatment, 0 control) - \(A\): treatment received, binary (1 treatment, 0 control)
- \(Y\): outcome

- \(Z\):
- Due to noncompliance, not everyone assigned treatment will actually receive the treatment, and vice verse (\(A \neq Z\))
- There can be confounding \(X\), like common causes affecting both treatment received \(A\) and the outcome \(Y\)
- It may be reasonable to assume that \(Z\) does not directly affect \(Y\)

### Causal effect of assignment on receipt

Observed data: \((Z, A, Y)\)

Each subject has two potential values of treatment

- \(A^{Z=1} = A^1\): value of treatment if randomized to treatment
- \(A^{Z=0} = A^0\): value of treatment if randomized to control

- Average causal effect of treatment assignment on treatment received
\[E\left(A^1 - A^0\right)\]
- If perfect compliance, this would be \(1\)
- By randomization and consistency, this is estimable from the observed data \[ E\left(A^1\right) = E(A \mid Z=1), \quad E\left(A^0\right) = E(A \mid Z=0) \]

### Causal effect of assignment on outcome

Average causal effect of treatment assignment on the outcome \[E\left(Y^{Z=1} - Y^{Z=0}\right)\]

- This is intention-to-treat effect
- If perfect compliance, this would be equal to the causal effect of treatment received
- By randomization and consistency, this is estimable from the observed data \[ E\left(Y^{Z=1}\right) = E(Y \mid Z=1), \quad E\left(Y^{Z=0}\right) = E(Y \mid Z=0) \]

## Compliance classes

### Subpopulations based on potential treatment

\(A^0\) | \(A^1\) | Label |
---|---|---|

0 | 0 | Never-takers |

0 | 1 | Compliers |

1 | 0 | Defiers |

0 | 0 | Always-takers |

- For never-takers and always-takers,
- Encouragement does not work
- Due to no variation in treatment received, we cannot learn anything about the effect of treatment in these two subpopulations

- For compliers, treatment received is randomized
- For defiers, treatment received is also randomized, but in the opposite way

### Local average treatment effect

- We will focus on a local average treatment effect, i.e., the complier average causal effect (CACE)

\[\begin{align*} & E\left(Y^{Z=1} \mid A^0=0, A^1=1 \right) - E\left(Y^{Z=0} \mid A^0=0, A^1=1 \right)\\ = & E\left(Y^{Z=1} - Y^{Z=0} \mid \text{compliers} \right)\\ = & E\left(Y^{a=1} - Y^{a=0} \mid \text{compliers} \right) \end{align*}\]

- “Local”: this is a causal effect in a subpopulation
- No inference about defiers, always-takers, or never-takers

## Instrumental variable assumptions

### IV assumption 1: exclusion restriction

- \(Z\) is associated with the treatment \(A\)

\(Z\) affects the outcome only through its effect on treatment

- \(Z\) cannot directly, or indirectly though its effect on \(U\), affect \(Y\)

### Is the exclusion restriction assumption realistic?

If \(Z\) is a random treatment assignment, then the exclusion restriction assumption is met

- It should affect treatment received
- It should not affect the outcome or unmeasured confounders

However, it the subjects or clinicians are not blinded, knowledge of what they are assigned to could affect \(Y\) or \(U\)

We need to examine the exclusion restriction assumption carefully for any given study

### IV assumption 2: monotonicity

Monotonicity assumption: there are no defiers

- No one consistently does the opposite of what they are told
- Probability of treatment should increase with more encouragement

With monotonicity,

\(Z\) | \(A\) | \(A^0\) | \(A^1\) | Class |
---|---|---|---|---|

0 | 0 | 0 | ? | Never-takers or compliers |

0 | 1 | 1 | 1 | Always-takers |

1 | 0 | 0 | 0 | Never-takers |

1 | 1 | ? | 1 | Always-takers or compliers |

# Estimate Causal Effects with Instrumental Variables

### Estimate CACE: 1. rewrite the ITT effect

Due to randomization, we can identify the ITT effect \[ E\left( Y^{z=1} - Y^{z=0} \right) = E(Y\mid Z=1) - E(Y\mid Z=0) \]

Expand the first term in the above ITT effect \[\begin{align*} E(Y\mid Z=1) = & E(Y\mid Z=1, \text{always takers})P(\text{always takers}\mid Z=1)\\ & + E(Y\mid Z=1, \text{never takers})P(\text{never takers}\mid Z=1)\\ & + E(Y\mid Z=1, \text{compliers})P(\text{compliers}\mid Z=1) \end{align*}\]

- Note 1: among always takers and never takes, \(Z\) does nothing
- \(E(Y\mid Z=1, \text{always takers}) = E(Y\mid \text{always takers}), \quad \text{etc.}\)

- Note 2: by randomization,
- \(P(\text{always takers}\mid Z=1) = P(\text{always takers}), \quad \text{etc.}\)

### Estimate CACE: 1. rewrite the ITT effect, cont.

Therefore, the first term in the ITT effect is \[\begin{align*} E(Y\mid Z=1)=& E(Y\mid\text{always takers})P(\text{always takers})\\ & + E(Y\mid \text{never takers})P(\text{never takers})\\ & + E(Y\mid Z=1, \text{compliers})P(\text{compliers}) \end{align*}\]

Similarly, the second term is \[\begin{align*} E(Y\mid Z=0)=& E(Y\mid\text{always takers})P(\text{always takers})\\ & + E(Y\mid \text{never takers})P(\text{never takers})\\ & + E(Y\mid Z=0, \text{compliers})P(\text{compliers}) \end{align*}\]

Their difference is \[\begin{align*} & E(Y\mid Z=1) - E(Y\mid Z=0)\\ = & \left[E(Y\mid Z=1, \text{compliers})- E(Y\mid Z=0, \text{compliers})\right]P(\text{compliers}) \end{align*}\]

### Estimate CACE: 2. compute proportion of compliers

Thus, the relationship between CACE and ITT effect is \[ \text{CACE} = \frac{E(Y\mid Z=1) - E(Y\mid Z=0)}{P(\text{compliers})} \]

To compute \(P(\text{compliers})\), note that

- \(E(A\mid Z=1)\): proportion of always takers plus compliers
- \(E(A\mid Z=0)\): proportion of always takers

Thus the difference is \[ P(\text{compliers}) = E(A\mid Z=1) - E(A\mid Z=0) \]

### Estimate CACE: final formula

\[ \text{CACE} = \frac{E(Y\mid Z=1) - E(Y\mid Z=0)} {E(A\mid Z=1) - E(A\mid Z=0)} \]

Numerator: ITT, causal effect of treatment assignment on the outcome

- Denominator: causal effect of treatment assignment on the treatment received
- Denominator is between 0 and 1. Thus, CACE \(\geq\) ITT
- ITT is underestimate of CACE, because some people assigned to treatment did not take it

If perfect compliance, CACE \(=\) ITT

## IVs in observational studies

### IVs in observational studies

IVs can also be used in observational (non-randomized) studies

- \(Z\): instrument
- \(A\): treatment
- \(Y\): outcome
- \(X\): covariates

- \(Z\) can be thought of as encouragement
- If binary, just encouragement yes or no
- If continuous, a ‘dose’ of encouragement

\(Z\) can be thought of as randomizers in natural experiments

- The key challenge: think of a variable that affects \(Y\) only through \(A\)
- Only the assumption \(Z\) affecting \(A\) can be checked with data
- The validity of the exclusion restriction assumption rely on subject matter knowledge

### Natural experiment example 1: calendar time as IV

Rationale: sometimes treatment preferences change over a short period of time

\(A\): drug A vs drug B

\(Z\): early time period (drug A is encouraged) vs late time period (drug B is encouraged)

\(Y\): BMI

### Natural experiment example 2: distance as IV

Rationale: shorter distance to NICU is an encouragement

\(A\): delivery at high level NICU vs regular hospital

\(Z\): differential travel time from nearest high level NICU to nearest regular hospital

\(Y\): mortality

### More examples of natural experiments

Mendelian randomization: some genetic variant is associate with some behavior (e.g., alcohol use) but is assumed to not be associated with outcome of interest

Provider preference: use treatment prescribed to previous patients as an IV for current patient

Quarter of birth: to study causal effect of years in school on income

## Two stage least squares

### Ordinary least squares (OLS) fails if there is confounding

- In OLS, one important assumption is that the covariate \(A\) is independent with residuals \(\epsilon\)

\[ Y_i = \beta_0 + A_i \beta_1 + \epsilon_i \]

However, if there is confounding, \(A\) and \(\epsilon\) are correlated. So OLS fails.

Two stage least squares can estimate causal effect in the instrumental variables (IV) setting

### Two stage least squares (2SLS)

- Stage 1: regress \(A\) on \(Z\)
\[
A_i = \alpha_0 + Z_i \alpha_1 + e_i
\]
- By randomization, \(Z\) and \(e\) are independent

- Obtain the predicted value of \(A\) given \(Z\) for each subject
\[
\hat{A}_i = \hat{\alpha}_0 + Z_i \hat{\alpha}_1
\]
- \(\hat{A}\) is projection of \(A\) onto the space spanned by \(Z\)

- Stage 2: regress \(Y\) on \(\hat{A}\)
\[
Y_i = \beta_0 + \hat{A}_i \beta_1 + \epsilon_i
\]
- By exclusion restriction, \(Z\) is independent of \(Y\) given \(A\)

### Interpretation of \(\beta_1\) in 2SLS: the causal effect

Consider the case where both \(Z\) and \(A\) are binary \[ \beta_1 = E\left(Y \mid \hat{A}=1 \right) - E\left(Y \mid \hat{A}=0 \right) \]

There are two values of \(\hat{A}\) in the 2nd stage model, \(\hat{\alpha}_0\) and \(\hat{\alpha}_0 + \hat{\alpha}_1\)

- When we go from \(Z=0\) to \(Z=1\), what we observe is going from \(\hat{\alpha}_0\) to \(\hat{\alpha}_0 + \hat{\alpha}_1\)
- We observe a mean difference of \(\hat{E}(Y\mid Z=1) - \hat{E}(Y\mid Z=0)\) with a \(\hat{\alpha}_1\) unit change in \(\hat{A}\)

Thus, we should observe a mean difference of \(\frac{\hat{E}(Y\mid Z=1) - \hat{E}(Y\mid Z=0)}{\hat{\alpha}_1}\) with \(1\) unit change in \(\hat{A}\)

The 2SLS estimator is a consistent estimator of the CACE \[ \beta_1 = \text{CACE} = \frac{\hat{E}(Y\mid Z=1) - \hat{E}(Y\mid Z=0)}{\hat{E}(A\mid Z=1) - \hat{E}(A\mid Z=0)} \]

### More general 2SLS

2SLS can be used

- with covariates \(X\), and
- for non-binary data (e.g, a continuous instrument)

Stage 1: regression \(A\) on \(Z\) and covariates \(X\)

- and obtain the fitted values \(\hat{A}\)

Stage 2: regress \(Y\) on \(\hat{A}\) and \(X\)

- Coefficient of \(\hat{A}\) is the causal effect

## Sensitivity analysis and weak instruments

### Sensitivity analysis

Sensitivity analysis method studies when each of the IV assumption (partly) fails

- Exclusion restriction: if \(Z\) does affect \(Y\) by an amount \(p\), would my conclusion change? Vary \(p\)
- Monotonically: if the proportion of defiers was \(\pi\), would my conclusion change?

### Strength of IVs

Depend on how well an IV predicts treatment received, we can class it as a strong instrument or a weak instrument

For a weak instrument, encouragement barely increases the probability of treatment

Measure the strength of an instrument: estimate the proportion of compliers \[ E(A \mid Z=1) - E(A \mid Z=0) \]

- Alternatively, we can just use the observed proportions of treated subjects for \(Z=1\) and for \(Z=0\)

### Problems of weak instruments

Suppose only 1% of the population are compliers

Then only 1% of the samples have useful information about the treatment effect

- This leads to large variance estimates, i.e., estimate of causal effect is unstable
- The confidence intervals can be too wide to be useful

### References

Coursera class: “A Crash Course on Causality: Inferring Causal Effects from Observational Data”, by Jason A. Roy (University of Pennsylvania)