Book Notes: Statistical Analysis with Missing Data -- Ch3 Complete Case Analysis and Weighting Methods

For the pdf slides, click here

Complete-case (CC) analysis

  • Complete-case (CC) analysis: use only data points (units) where all variables are observed

  • Loss of information in CC analysis:

    • Loss of precision (larger variance)
    • Bias, when the missingness mechanism is not MCAR. In this case, the complete units are not a random sample of the population
  • In this notes, I will focus on the bias issue

    • Adjusting for the CC analysis bias using weights
    • This idea is closed related to weighting in randomization inference for finite population surveys

Weighted Complete-Case Analysis

Notations

  • Population size \(N\), sample size \(n\)
  • Number of variables (items): \(K\)
  • Data: \(Y=(y_{ij})\), where \(i = 1, \ldots, N\) and \(j = 1, \ldots, K\)
  • Design information (about sampling or missingness): \(Z\)
  • Sample indicator: \(I = (I_1, \ldots, I_N)'\); for unit \(i\), \[ I_i = \mathbf{1}_{\{\text{unit } i \text{ included in the sample}\}} \]

  • Sample selection processes can be characterized by a distribution for \(I\) given \(Y\) and \(Z\).

Probability sampling

  • Properties of probability sampling

    1. Unconfounded: selection doesn’t depend on \(Y\), i.e., \[ f(I \mid Y, Z) = f(I \mid Z) \]

    2. Every unit has a positive (known) probability of selection \[ \pi_i = P(I_i = 1 \mid Z) > 0, \quad \text{for all } i \]

  • In equal probability sample design, \(\pi_i\) is the same for all \(i\)

Stratified random sampling

  • \(Z\) is a variable defining strata. Suppose Stratum \(Z=j\) has \(N_j\) units in total, for \(j= 1, \ldots, J\)

  • In Stratum \(j\), stratified random sampling takes a simple random sample of \(n_j\) units

  • The distribution of \(I\) under stratified random sampling is \[ f(I \mid Z) = \prod_{j=1}^J {N_j \choose n_j}^{-1} \]

Example: estimating population mean \(\bar{Y}\)

  • An unbiased estimate is the stratified sample mean \[ \bar{y}_{\text{st}} = \frac{\sum_{j=1}^J N_j \bar{y}_j}{N} \] where \(\bar{y}_j\) is the sample mean in stratum \(j\)

  • Sampling variance approximation \[ v(\bar{y}_{st}) \approx \frac{1}{N^2} \sum_{j=1}^J N_j^2 \left(\frac{1}{n_j} - \frac{1}{N_j} \right)s_j^2 \] where \(s_j\) is the sample variance of \(Y\) in stratum \(j\)

  • A large sample 95% confidence interval for \(\bar{Y}\) is \[ \bar{y}_{\text{st}} \pm 1.96 \sqrt{v(\bar{y}_{st})} \]

Weighting methods

  • Main idea: A unit selected with probability \(\pi_i\) is “representing” \(\pi_i^{-1}\) units in the population, hence should be given weights \(\pi_i^{-1}\).

  • For example, in stratified random sample

    • A selected unit \(i\) in stratum \(j\) represents \(N_j/n_j\) population units
    • Thus by Horvitz-Thompson estimate, the population mean can be estimated by the weighted sum \[ \bar{y}_w = \frac{1}{n}\sum_{i=1}^n w_i y_i, \quad \pi_i = \frac{n_j}{N_j}, \quad w_i = n \cdot \frac{\pi_i^{-1}}{\sum_k \pi_k^{-1}} \]
    • It is not hard to show that \[ \bar{y}_w = \bar{y}_{\text{st}} \]

Weighting with nonresponses

  • If the probability of selecting unit \(i\) is \(\pi_i\), and the probability of response for unit \(i\) is \(\phi_i\), then \[ P(\text{unit } i \text{ is observed}) = \pi_i \phi_i \]

  • Suppose there are \(r\) units observed (respondents). Then the weighted estimate for \(\bar{Y}\) is \[ \bar{y}_w = \frac{1}{r} \sum_{i=1}^r w_i y_i, \quad w_i = r \cdot \frac{(\pi_i \phi_i)^{-1}}{\sum_k (\pi_k \phi_k)^{-1}} \]

  • Usually \(\phi_i\) is unknown and thus needs to be estimated

Weighting class estimator

  • Weighting class adjustments are used primarily to handle unit nonresponse

  • Suppose we partition the sample into \(J\) “weighting classes”. In the weighting class \(C = j\):

    • \(n_j\): the sample size
    • \(r_j\): number of observed samples
    • A simple estimator for \(\phi_j\) is \(\hat{\phi}_j = \frac{r_j}{n_j}\)
  • For equal probability designs, where \(\pi_i\) is constant, the weighting class estimator is \[ \bar{y}_{\text{wc}} = \frac{1}{n}\sum_{j=1}^J n_j \bar{y}_{j\text{R}} \] where \(\bar{y}_{j\text{R}}\) is the respondent mean in class \(j\)

  • The estimate is unbiased under the following form of MAR assumption (Quasirandomization): data are MCAR within weighting class \(j\)

More about weighting class adjustments

  • Pros: handle bias with one set of weights for multivariate \(Y\)

  • Cons: weighting is inefficient and can increase in sampling variance, if \(Y\) is weakly related to the weighting class variable \(C\)

  • How to choose weighting class adjustments: weighting is only effective for outcomes (\(Y\)) that are associated with the adjustment cell variable (\(C\)). See the right column in the table below.

Propensity weighting

  • The theory of propensity scores provides a prescription for choosing the coarsest reduction of \(X\) to a weighting class variable \(C\) so that quasirandomization is roughly satisfied

  • Let \(X\) denote the variables observed for both respondents and nonrespondents

  • Suppose data are MAR, with \(\phi\) being unknown parameters about missing mechanism \[ P(M \mid X, Y, \phi) = P(M \mid X, \phi) \] Then quasirandomization is satisfied when \(C\) is chosen to be \(X\)

Response propensity stratification

  • Define response propensity for unit \(i\) as \[ \rho(x_i, \phi) = P\left(m_i = 0 \mid \rho(x_i, \phi), \phi\right) \] i.e., respondents are a random subsample within strata defined by the propensity score \(\rho(X, \phi)\)

  • Usually \(\phi\) is unknown. So a practical procedure is

    1. Estimate \(\hat{\phi}\) from a binary regression of \(M\) on \(X\), based on respondent and nonrespondent data
    2. Let \(C\) be a grouped variable by coarsening \(\rho\left(X, \hat{\phi}\right)\) into 5 or 10 values
  • Thus, within the same adjustment class, all respondents and nonrespondents have the same value of the grouped propensity score

An alternative procedure: propensity weighting

  • An alternative procedure is to weight respondents \(i\) directly by the inverse propensity score \(\rho\left(X, \hat{\phi}\right)^{-1}\)

  • This method removes nonresponse bias

  • But it may yield estimates with extremely high sampling variance because respondents with very low estimated response propensities receive large nonresponse weights

  • Also, weighting directly by inverse propensities place may reliance on correct model specification of the regression of \(M\) on \(X\)

Example: inverse probability weighted generalized estimating equations (GEE)

  • Let \(x_i\) be covariates of GEE, and \(z_i\) be a fully observed vector that can predict missing mechanism

  • If \(P(m_i = 1 \mid x_i, y_i, z_i, \phi) = P(m_i = 1 \mid x_i, \phi)\), then the unweighted completed case GEE is unbiased \[ \sum_{i=1}^r D_i(x_i, \beta)\left[y_i - g(x_i, \beta)\right] = 0 \]

  • If \(P(m_i = 1 \mid x_i, y_i, z_i, \phi) = P(m_i = 1 \mid x_i, z_i, \phi)\), then the inverse probability weighted GEE is unbiased \[ \sum_{i=1}^r w_i(\hat{\alpha}) D_i(x_i, \beta)\left[y_i - g(x_i, \beta)\right] = 0, \quad w_i(\hat{\alpha}) = \frac{1}{p(x_i, z_i \mid \hat{\alpha})} \] where \(p(x_i, z_i \mid \hat{\alpha})\) is the probability of being a complete unit, based on logistic regression of \(m_i\) on \(x_i, z_i\)

Poststratification

  • The weighting class estimator \[ \bar{y}_{\text{wc}} = \frac{1}{n}\sum_{j=1}^J n_j \bar{y}_{j\text{R}} \] uses the sample proportion \(n_j/n\) to estimate the population proportion \(N_j/N\).

  • If from an external resource (e.g., census or a large survey), we know the population proportion of weighting classes, then we can use the post stratified mean to estimate \(\bar{Y}\): \[ \bar{y}_{\text{ps}} = \frac{1}{N}\sum_{j=1}^J N_j \bar{y}_{j\text{R}} \]

Summary of weighting methods

  • Weighted CC estimates are often simple to compute, but the appropriate standard errors can be hard to compute (even asymptotically)

  • Weighting methods treat weights as fixed and known, but these nonresponse weights are computed from observed data and hence are subject to sampling uncertainty

  • Because weighted CC methods discard incomplete units and do not provide an automatic control of sampling variance, they are most useful when

    • Number of covariates is small, and
    • Sample size is large

Available-Case Analysis

Available-case (AC) analysis

  • Available-case analysis: for univariate analysis, include all unites where that variable is present

    • Sample changes from variable to variable according to the pattern of missing data
    • This is problematic if not MCAR
    • Under MCAR, AC can be used to estimate mean and variance for a single variable
  • Pairwise AC: estimates covariance of \(Y_j\) and \(Y_k\) based on units \(i\) where both \(y_{ij}\) and \(y_{ik}\) are observed

    • Pairwise covariance estimator: \[ s_{jk}^{(jk)} = \sum_{i \in I_{jk}} \left( y_{ij} - \bar{y}_j^{(jk)} \right) \left( y_{ik} - \bar{y}_k^{(jk)} \right)/ \left( n^{(jk)} - 1 \right) \] where \(I_{jk}\) is the set of \(n^{(jk)}\) units with both \(Y_j\) and \(Y_k\) observed

Problems with pairwise AC estimators on correlation

  • Correlation estimator 1: \[ r_{jk}^* = \frac{s_{jk}^{(jk)}}{\sqrt{s_{jj}^{(j)} s_{kk}^{(k)}}} \]

    • Problem: it can lie outside of \((-1, 1)\)
  • Correlation estimator 2 corrects the previous problem: \[ r_{jk}^{(jk)} = \frac{s_{jk}^{(jk)}}{\sqrt{s_{jj}^{(jk)} s_{kk}^{(jk)}}} \]

  • Under MCAR, all these estimators on covariance and correlation are consistent

  • However, when \(K > 3\), both correlation estimators can yield correlation matrices that are not positive definite!

    • An extreme example: \(r_{12} = 1, r_{13} = 1, r_{23} = -1\)

Compare CC and AC methods

  • When data is MCAR and correlations are mild, AC methods are more efficient than CC

  • When correlations are large, CC methods are usually better

References

  • Little, R. J., & Rubin, D. B. (2019). Statistical Analysis with Missing Data, 3rd Edition. John Wiley & Sons.