*For the pdf slides, click here*

# Matching

## Experiments vs observational studies

### Randomized trials

In experiments, treatment \(A\) is determined by a coin toss; so there are no arrows from \(X\) to \(A\), i.e., no backdoor paths

Covariate balance: distribution of pre-treatment variables \(X\) are the same in both treatment groups

- Hence, if there is difference in the outcome, it is not because of \(X\)

### Observational studies

In observational studies, the distribution of \(X\) may differ between treatment groups

For example, older people may be more likely to get treatment \(A=1\):

## Overview of matching

### Matching

Matching: a method that attempts to make an observational study more like a randomized trial

**Main idea of matching**: match individuals in the treated group \((A=1)\) to individuals in the control group \((A=0)\) on the covariates \(X\)- Usually, the sample size of the treated group is smaller than the control group, so after matching, we will use all cases in the treated group, but only a fraction of the cases in the control group

In the example where older people are more likely to get \(A=1\)

- At younger (older) ages, there are more people with \(A=0\) (\(A=1\))
In a randomized trial, for any particular age, there should be about the same number of treated and untreated people

This balance can be achieved by matching treated people to control people of the same age

### Advantages of matching

Controlling for confounders is achieved at the design phase, i.e., without looking at the outcome

Matching will reveal lack of overlap in covariate distribution

- Positivity assumption will hold in the population that can be matched

We can treated a matched dataset as if from a randomized trial

Outcome analysis is simple

### Match on a single covariate

Suppose red patients are more likely to be treated than blue ones

Before matching

- After matching

### Match on many covariates

We will not be able to exactly match on the full set of covariates

In randomized trials, treated and control subjects are not perfect matches either; the distribution of covariates is balanced between groups (stochastic balance)

Matching closely on covariates can achieve stochastic balance

### Example of matching on two covariates: sex and age

- Before matching

- After matching

### Target population of matching: the treated

By matching, we are making the distribution of covariates in the control population look like that in the treated population

So we will analysis causal effect of treatment on the treated

- There are matching methods that can be used to target a different population (beyond scope of this course)

### Fine balance

Sometimes it is hard to find great matches, so we are willing to accept some non-ideal matches,

**if treated and control groups have the same distribution of covariates (fine balance)**For the matches below, average age and percent female are the same in both groups, although neither match is great

- Match 1: treated, male, age 40 and control, female, age 45
- Match 2: treated, female, age 45 and control, male, age 40

### Number of matches

One to one (pair matching): match one control to every treated subject

Many to one: match \(K\) (a fixed number) controls to every treated subject; e.g., 5 to 1 matching

Variable: sometime match 1, sometimes more than 1 (if multiple good matches available), control to treated subjects

### Metrics used in matching

Mahalanobis distance between two vectors: \[ D(\mathbf{X}_i, \mathbf{X}_j) = \sqrt{(\mathbf{X}_i - \mathbf{X}_j)^T \mathbf{S}^{-1}(\mathbf{X}_i - \mathbf{X}_j)}, \quad \mathbf{S} = cov(\mathbf{X}) \]

- We use covariance to scale so that the M distance is invariant of unit change

Robust Mahalanobis distance: robust to outliers

- Replace each covariate value with its rank
- Constant diagonal on covariance matrix
- Calculate the usual M distance on the ranks

### Types of matching

- Greedy (nearest neighbor) matching
- Not ideal, but computationally fast

- Optimal matching
- Better, but computationally demanding

## Nearest neighbor matching

### Setup

We have selected a set of pre-treatment covariates \(X\) that satisfy the ignorability assumption

We have calculated a distance \(d_{i,j}\) between each treated subject \(i\) and each control subject \(j\)

We have more control subjects than the treated subjects

We will focus on pair matching (one-to-one)

### Nearest neighbor matching (greedy)

Randomly order list of treated subjects and control subjects

Start with the first treated subject, match to the control with the smallest distance

Remove the matched control from the list of available matches

Move on to the next treated subject. Repeat until all treated subjects are matched

Not invariant to order initial order of list

Not optimal: always taking the smallest distance match does not minimize total distance

R package: MatchIt, https://cran.r-project.org/web/packages/MatchIt/MatchIt.pdf

### Many-to-one matching

For \(K\):1 matching: after everyone has 1 match, go through the list again and find 2nd matches. Repeat until \(K\) matches

Pair matching (one-to-one) vs many-to-one matching: a bias-variance tradeoff

Pair matching: closer matches, faster computing time

Many-to-one matching: larger sample size

### Caliper

We may exclude a treated subject if there is no good matches for it

Caliper: maximum acceptable distance

- Only match a treated subject if the best match has distance less than the caliper

If no matches within caliper, it is a sign of violation of the positivity assumption. So we should exclude these subjects

Drawback: population harder to define

## Optimal matching

### Optimal matching

Optimal matching: minimized global distance measure, e.g., total distance

Computational feasibility of optimal matching: depends on the size of the problem

Number of treatment-control pairing: product of number of treatment and number of control

1 million treatment-control pairings is feasible on most computers (not quick, though)

1 billion pairings is not feasible

R packages: optmatch, rcbalance

## Assessing matching balance

### Assessing matching balance

- Check covariate balance: compute standardized difference to see if each covariate has similar means between treatment and control
\[
smd = \frac{\bar{X}_{\text{treatment}} - \bar{X}_{\text{control}}}{\sqrt{\frac{s^2_{\text{treatment}} + s^2_{\text{control}}}{2}}}
\]
- Does not depend on sample size
- Often, absolute value of smd is reported
- We calculate smd for each variable we match on
- This analysis does not involve the outcome variable

Rule of thumb:

- \(|smd| < 0.1\): adequate balance
- \(|smd| \in [0.1, 0.2]\): not too alarming
- \(|smd| > 0.2\): serious imbalance

### Example: right heat characterization (RHC) data

- Table 1: compares pre-matching and post-matching balance

### Example continued: RHC data

- SMD plot: visualizes Table 1

## Analyze data after matching

### After matching, proceed with outcome analysis

Test for a treatment effect

Estimate a treatment effect and confidence interval

Methods should take matching into account

### Randomization test

Also known as permutation tests or exact tests

Main ideas of randomization test

Compute test statistic from observed data, assuming null hypothesis of no treatment effect is true

Randomly

**permute treatment assignment**within pairs and recompute test statisticRepeat many times and see how unusual observed statistic is

### A binary outcome example

Test statistic: the total number of events in the treated group

- Test stat \(T=6\) in the observed data

- In the observed data, discordant pairs (in red) are the only ones can change during treatment permutation

### Permutation test for binary outcome: equivalent to a NcNemar test

### NcNemar test: whether row and column marginal frequencies are the same, for paired binary data

- Paired binary data, represented in a 2 by 2 contingency table

Test 2 positive | Test 2 negative | Row total | |
---|---|---|---|

Test 1 positive | \(a\) | \(b\) | \(a+b\) |

Test 1 negative | \(c\) | \(d\) | \(c+d\) |

Column total | \(a+c\) | \(b+d\) | \(N\) |

Hypotheses: whether \(p_a + p_b = p_a + p_c\), or equivalently, \[ H_0: p_b = p_c, \quad H_1: p_b \neq p_c \]

Test statistic \[ \frac{(b-c)^2}{b+c} \stackrel{\text{under } H_0}{\sim} \chi^2_{df=1} \]

### Permutation test for continuous outcome: equivalent to a paired t-test

- Test statistic: difference in sample means

### Other outcome models

Conditional logistic regression

- Matched binary outcome data

Generalized estimating equations (GEE)

- Match ID variable used to specify clusters
- For binary outcomes, can estimate a causal risk difference, causal risk ratio, or causal odds ratio (depending on link function)

# Propensity Score

### Propensity score

Propensity score: probability of receiving treatment, given covariates \(X\) \[ \pi_i = P(A_i = 1 \mid X_i) \]

Propensity score is a balancing score \[ P(X = x \mid \pi(X) = p, A = 1) = P(X = x \mid \pi(X) = p, A = 0) \]

- Suppose two subjects have the same value of propensity score, but different covariate values \(X\)
- This means that both subjects’ \(X\) is just as likely to be found in the treatment group
- So if we restrict to a subpopulation of subjects who have the same value of the propensity score, there should be balance in the treatment vs control groups

We can match on propensity score to achieve balance

### Logistic regression to estimate propensity score

In a randomized trial, the propensity score is known \[P(A=1\mid X) = P(A=0\mid X) = 0.5\]

In an observational study, we need to estimate the propensity score \(P(A=1\mid X)\)

- Fit a logistic regression: outcome \(A\), covariates \(X\)
- Get the predicted probability (fitted value) for each subject as the estimated propensity score

## Propensity score matching

### Before propensity score matching: check for overlap

Propensity score matching is simple; it’s matching on one variable

After the propensity is estimated, but before matching, it is useful to look for overlap

- This is to check positivity assumption

Example of good overlap

### Trim tails if there is a lack of overlap

- Example of bad overlap

Trim tails: remove subjects who have extreme values of propensity score

- Remove control subjects whose propensity score is less than the minimum in the treatment group
- Remove treated subjects whose propensity score is greater than the maximum in the control group

### Propensity score matching

Compute a distrance between the propensity score for each treated subject with every control

Use nearest neighbor or optimal matching

In practice, logit of the propensity score is often used, rather than the propensity score itself

A caliper can be used to avoid bad matches

- After matching: outcome analysis
- Randomization tests
- Conditional logistic regression, GEE, etc

### References

Coursera class: “A Crash Course on Causality: Inferring Causal Effects from Observational Data”, by Jason A. Roy (University of Pennsylvania)