*For the pdf slides, click here*

# Inverse Probability of Treatment Weighting

### Motivating example

- Suppose there is a single confounder \(X\), with propensity scores \[ P(A=1\mid X=1) = 0.1, \quad P(A=1\mid X=0) = 0.8 \]

In propensity score matching, for subjects with \(X=1\), 1 out of 9 controls will be matched to the treated

Thus, 1 person in the treated group counts the same as 9 people from the control group

So rather than matching, we could use all data, but down-weight each control subject to be just 1/9 of the treated subject

### Inverse probability of treatment weighting (IPTW)

IPTW weights: inverse of the probability of treatment received

- For treated subjects, weight by \(1/P(A=1\mid X)\)
- For control subjects, weight by \(1/P(A=0\mid X)\)

In the previous example

For \(X=1\), the weight for a treated subject is \(1/0.1 = 10\), and the weight for a control subject is \(1/0.9 = \frac{10}{9}\)

For \(X=0\), the weight for a treated subject is \(1/0.8 = \frac{5}{4}\), and the weight for a control subject is \(1/0.2 = 5\)

Motivation: in survey sampling, it is common to oversample some subpopulation, and then use Horvitz-Thompson estimator to estimate population means

### Pseudo population

IPTW creates a pseudo-population where treatment assignment no longer depend on \(X\)

- So there is no confounding in the pseudo-population

In the original population, some people were more likely to get treated based on their \(X\)’s

In the pseudo-population, everyone is equally likely to get treated, regardless of their \(X\)’s

### Estimation with IPTW

We can estimate \(E(Y^1)\) as below \[ \frac{\sum_{i=1}^n \frac{1}{\pi_i} A_i Y_i} {\sum_{i=1}^n \frac{1}{\pi_i} A_i } \]

- where \(\pi_i = P(A_i=1|X_i)\) is the propensity score
- The numerator is the sum of \(Y\)’s in treated pseudo-population
- The denominator is the number of subjects in treated pseudo-population

We can estimate \(E(Y^0)\) as below \[ \frac{\sum_{i=1}^n \frac{1}{1-\pi_i} (1-A_i) Y_i} {\sum_{i=1}^n \frac{1}{1-\pi_i} (1-A_i) } \]

Average treatment effect: \(E(Y^1) - E(Y^0)\)

# Marginal Structural Models

### Marginal structural models

Marginal structural models (MSM): a model for the mean of the potential outcomes

**Marginal**: not conditional on the confounders (population average)**Structural**: for potential outcomes, not observed outcomes

### Linear MSM and logistic MSM

Linear MSM \[ E(Y^a) = \psi_0 + \psi_1 a, \quad a = 0, 1 \]

- \(E(Y^0) = \psi_0\), \(E(Y^0) = \psi_0 + \psi_1\)
- So the
**average causal effect**\[E(Y^1) - E(Y^0) = \psi_1\]

Logistic MSM \[ logit\{E(Y^a)\} = \psi_0 + \psi_1 a, \quad a = 0, 1 \]

- So the
**causal odds ratio**\[ \frac{\frac{P(Y^1=1)}{1-P(Y^1=1)}}{\frac{P(Y^0=1)}{1-P(Y^0=1)}} = \psi_1 \]

- So the

### MSM with effect modification

Suppose \(V\) is a variable that modifies the effect of \(A\)

A linear MSM with effect modification \[ E(Y^a \mid V) = \psi_0 + \psi_1 a + \psi_3V + \psi_4 a V, \quad a = 0, 1 \]

- So the average causal effect \[E(Y^1) - E(Y^0) = \psi_1 + \psi_4 V\]

- General MSM
\[
g\{E(Y^a \mid V)\} = h(a, V; \psi)
\]
- \(g()\): link function
- \(h()\): a function specifying parametric from of \(a\) and \(V\) (typically additive, linear)

### MSM estimation using pseudo-population

Because of confounding, MSM \[ g\{E(Y^a \mid V)\} = \psi_0 + \psi_1 a \] is difference from GLM (generalized linear model) \[ g\{E(Y_i \mid A_i)\} = \psi_0 + \psi_1 A_i \]

Pseudo-population (obtained from IPTW) is free of confounding

**We therefore estimate MSM by solving GLM with IPTW**

### MSM estimation steps

Estimate propensity score, using logistic regression

Create weights

- Inverse of propensity score for treated subjects
- Inverse of one minus propensity score for control subjects

Specify the MSM of interest

Use software to fit a weighted generalized linear model

Use asymptotic (sandwich) variance estimator

- This accounts for fact that pseudo-population might be larger than sample size

### Bootstrap

We may also use bootstrap to estimate standard error

Bootstrap steps

Randomly sample with replacement from the original sample

Estimate parameters

Repeat steps 1 and 2 many times

Use the standard deviation of the bootstrap estimates as an estimate of the standard error

## Assessing covariate balance with weights

### Covariate balance check with standardized differences

Covariate balance: can be checked on the weighted sample using

**standardized difference**\[ smd = \frac{\bar{X}_{\text{treatment}} - \bar{X}_{\text{control}}}{\sqrt{\frac{s^2_{\text{treatment}} + s^2_{\text{control}}}{2}}} \]- Weighted means \(\bar{X}_{\text{treatment}}\), \(\bar{X}_{\text{control}}\)
- Weighted variances \(s^2_{\text{treatment}}\), \(s^2_{\text{control}}\)

### Balance check tools

- Table 1

- SMD plot

### If imbalance after weighting

Refine propensity score model

- Interactions
- Non-linearity

Then reaccess balance

## Problems and remedies for large weights

### Larger weights lead to more noise

For an object with a large weight, its outcome data can greatly affect parameter estimation

An object with large weight can also affect standard error estimation, via bootstrap, depending on whether the object is selected or not

An extremely large weights means the probability of that treatment is very small, thus a potential violation of the positivity assumption

### Check weights via plots and summary statistics

- Investigate very large weights: identify the subjects with large weights and find what’s unusual about them

### Option 1: trimming the tails

Large weights: occur in the tails of the propensity score distribution

Trim the tails to eliminate some extreme weights

- Remove treated subjects whose propensity scores are above the 98th percentile from the distribution among controls
- Remove control subjects whose propensity scores are below the 2nd percentile from the distribution among treated

Note: trimming the tails changes the population

### Option 2: truncating the weights

Another option to deal with large weights is truncation

Weight truncation steps

- Determine a maximum allowable weight

- Can be a specific value (e.g., 100)
- Can based on a percentile (e.g., 99th)

- If a weight is greater than the maximum allowable, set it to the maximum allowable value

- Bias-variance trade-off
- Truncation: bias, but smaller variance
- No truncation: unbiased, larger variance

Truncating extremely large weights can result in estimators with lower MSE

### References

Coursera class: “A Crash Course on Causality: Inferring Causal Effects from Observational Data”, by Jason A. Roy (University of Pennsylvania)