Inverse Probability of Treatment Weighting
Marginal Structural Models
- Assessing covariate balance with weights
- Problems and remedies for large weights

For the pdf slides, click here

Inverse Probability of Treatment Weighting

Motivating example

Suppose there is a single confounder \(X\), with propensity scores \[ P(A=1\mid X=1) = 0.1, \quad P(A=1\mid X=0) = 0.8 \]

In propensity score matching, for subjects with \(X=1\), 1 out of 9 controls will be matched to the treated
- Thus, 1 person in the treated group counts the same as 9 people from the control group
- So rather than matching, we could use all data, but down-weight each control subject to be just 1/9 of the treated subject

Inverse probability of treatment weighting (IPTW)

IPTW weights: inverse of the probability of treatment received
- For treated subjects, weight by \(1/P(A=1\mid X)\)
- For control subjects, weight by \(1/P(A=0\mid X)\)
In the previous example
- For \(X=1\), the weight for a treated subject is \(1/0.1 = 10\), and the weight for a control subject is \(1/0.9 = \frac{10}{9}\)
- For \(X=0\), the weight for a treated subject is \(1/0.8 = \frac{5}{4}\), and the weight for a control subject is \(1/0.2 = 5\)
Motivation: in survey sampling, it is common to oversample some subpopulation, and then use Horvitz-Thompson estimator to estimate population means

Pseudo population

IPTW creates a pseudo-population where treatment assignment no longer depend on \(X\)
- So there is no confounding in the pseudo-population

In the original population, some people were more likely to get treated based on their \(X\)’s
In the pseudo-population, everyone is equally likely to get treated, regardless of their \(X\)’s

Estimation with IPTW

We can estimate \(E(Y^1)\) as below \[ \frac{\sum_{i=1}^n \frac{1}{\pi_i} A_i Y_i} {\sum_{i=1}^n \frac{1}{\pi_i} A_i } \]
- where \(\pi_i = P(A_i=1|X_i)\) is the propensity score
- The numerator is the sum of \(Y\)’s in treated pseudo-population
- The denominator is the number of subjects in treated pseudo-population
We can estimate \(E(Y^0)\) as below \[ \frac{\sum_{i=1}^n \frac{1}{1-\pi_i} (1-A_i) Y_i} {\sum_{i=1}^n \frac{1}{1-\pi_i} (1-A_i) } \]
Average treatment effect: \(E(Y^1) - E(Y^0)\)

Marginal Structural Models

Marginal structural models

Marginal structural models (MSM): a model for the mean of the potential outcomes
Marginal: not conditional on the confounders (population average)
Structural: for potential outcomes, not observed outcomes

Linear MSM and logistic MSM

Linear MSM \[ E(Y^a) = \psi_0 + \psi_1 a, \quad a = 0, 1 \]
- \(E(Y^0) = \psi_0\), \(E(Y^0) = \psi_0 + \psi_1\)
- So the average causal effect \[E(Y^1) - E(Y^0) = \psi_1\]
Logistic MSM \[ logit\{E(Y^a)\} = \psi_0 + \psi_1 a, \quad a = 0, 1 \]
- So the causal odds ratio \[ \frac{\frac{P(Y^1=1)}{1-P(Y^1=1)}}{\frac{P(Y^0=1)}{1-P(Y^0=1)}} = \psi_1 \]

MSM with effect modification

Suppose \(V\) is a variable that modifies the effect of \(A\)
A linear MSM with effect modification \[ E(Y^a \mid V) = \psi_0 + \psi_1 a + \psi_3V + \psi_4 a V, \quad a = 0, 1 \]
- So the average causal effect \[E(Y^1) - E(Y^0) = \psi_1 + \psi_4 V\]
General MSM \[ g\{E(Y^a \mid V)\} = h(a, V; \psi) \]
- \(g()\): link function
- \(h()\): a function specifying parametric from of \(a\) and \(V\) (typically additive, linear)

MSM estimation using pseudo-population

Because of confounding, MSM \[ g\{E(Y^a \mid V)\} = \psi_0 + \psi_1 a \] is difference from GLM (generalized linear model) \[ g\{E(Y_i \mid A_i)\} = \psi_0 + \psi_1 A_i \]
Pseudo-population (obtained from IPTW) is free of confounding
- We therefore estimate MSM by solving GLM with IPTW

MSM estimation steps

Estimate propensity score, using logistic regression
Create weights
- Inverse of propensity score for treated subjects
- Inverse of one minus propensity score for control subjects
Specify the MSM of interest
Use software to fit a weighted generalized linear model
Use asymptotic (sandwich) variance estimator
- This accounts for fact that pseudo-population might be larger than sample size

Bootstrap

We may also use bootstrap to estimate standard error
Bootstrap steps
1. Randomly sample with replacement from the original sample
2. Estimate parameters
3. Repeat steps 1 and 2 many times
4. Use the standard deviation of the bootstrap estimates as an estimate of the standard error

Assessing covariate balance with weights

Covariate balance check with standardized differences

Covariate balance: can be checked on the weighted sample using standardized difference \[ smd = \frac{\bar{X}_{\text{treatment}} - \bar{X}_{\text{control}}}{\sqrt{\frac{s^2_{\text{treatment}} + s^2_{\text{control}}}{2}}} \]
- Weighted means \(\bar{X}_{\text{treatment}}\), \(\bar{X}_{\text{control}}\)
- Weighted variances \(s^2_{\text{treatment}}\), \(s^2_{\text{control}}\)

Balance check tools

Table 1

SMD plot

If imbalance after weighting

Refine propensity score model
- Interactions
- Non-linearity
Then reaccess balance

Problems and remedies for large weights

Larger weights lead to more noise

For an object with a large weight, its outcome data can greatly affect parameter estimation
An object with large weight can also affect standard error estimation, via bootstrap, depending on whether the object is selected or not
An extremely large weights means the probability of that treatment is very small, thus a potential violation of the positivity assumption

Check weights via plots and summary statistics

Investigate very large weights: identify the subjects with large weights and find what’s unusual about them

Option 1: trimming the tails

Large weights: occur in the tails of the propensity score distribution
Trim the tails to eliminate some extreme weights
- Remove treated subjects whose propensity scores are above the 98th percentile from the distribution among controls
- Remove control subjects whose propensity scores are below the 2nd percentile from the distribution among treated
Note: trimming the tails changes the population

Option 2: truncating the weights

Another option to deal with large weights is truncation
Weight truncation steps
1. Determine a maximum allowable weight
- Can be a specific value (e.g., 100)
- Can based on a percentile (e.g., 99th)
1. If a weight is greater than the maximum allowable, set it to the maximum allowable value
Bias-variance trade-off
- Truncation: bias, but smaller variance
- No truncation: unbiased, larger variance
Truncating extremely large weights can result in estimators with lower MSE

References

Coursera class: “A Crash Course on Causality: Inferring Causal Effects from Observational Data”, by Jason A. Roy (University of Pennsylvania)
- https://www.coursera.org/learn/crash-course-in-causality

Course Notes: A Crash Course on Causality -- Week 4: Inverse Probability of Treatment Weighting (IPTW)

Inverse Probability of Treatment Weighting

Motivating example

Inverse probability of treatment weighting (IPTW)

Pseudo population

Estimation with IPTW

Marginal Structural Models

Marginal structural models

Linear MSM and logistic MSM

MSM with effect modification

MSM estimation using pseudo-population

MSM estimation steps

Bootstrap

Assessing covariate balance with weights

Covariate balance check with standardized differences

Balance check tools

If imbalance after weighting

Problems and remedies for large weights

Larger weights lead to more noise

Check weights via plots and summary statistics

Option 1: trimming the tails

Option 2: truncating the weights

References