Book Notes: Statistical Analysis with Missing Data -- Ch3 Complete Case Analysis and Weighting Methods

For the pdf slides, click here

Complete-case (CC) analysis

  • Complete-case (CC) analysis: use only data points (units) where all variables are observed

  • Loss of information in CC analysis:

    • Loss of precision (larger variance)
    • Bias, when the missingness mechanism is not MCAR. In this case, the complete units are not a random sample of the population
  • In this notes, I will focus on the bias issue

    • Adjusting for the CC analysis bias using weights
    • This idea is closed related to weighting in randomization inference for finite population surveys

Weighted Complete-Case Analysis

Notations

  • Population size N, sample size n
  • Number of variables (items): K
  • Data: Y=(yij), where i=1,,N and j=1,,K
  • Design information (about sampling or missingness): Z
  • Sample indicator: I=(I1,,IN); for unit i, Ii=1{unit i included in the sample}

  • Sample selection processes can be characterized by a distribution for I given Y and Z.

Probability sampling

  • Properties of probability sampling

    1. Unconfounded: selection doesn’t depend on Y, i.e., f(IY,Z)=f(IZ)

    2. Every unit has a positive (known) probability of selection πi=P(Ii=1Z)>0,for all i

  • In equal probability sample design, πi is the same for all i

Stratified random sampling

  • Z is a variable defining strata. Suppose Stratum Z=j has Nj units in total, for j=1,,J

  • In Stratum j, stratified random sampling takes a simple random sample of nj units

  • The distribution of I under stratified random sampling is f(IZ)=j=1J(Njnj)1

Example: estimating population mean Y¯

  • An unbiased estimate is the stratified sample mean y¯st=j=1JNjy¯jN where y¯j is the sample mean in stratum j

  • Sampling variance approximation v(y¯st)1N2j=1JNj2(1nj1Nj)sj2 where sj is the sample variance of Y in stratum j

  • A large sample 95% confidence interval for Y¯ is y¯st±1.96v(y¯st)

Weighting methods

  • Main idea: A unit selected with probability πi is “representing” πi1 units in the population, hence should be given weights πi1.

  • For example, in stratified random sample

    • A selected unit i in stratum j represents Nj/nj population units
    • Thus by Horvitz-Thompson estimate, the population mean can be estimated by the weighted sum y¯w=1ni=1nwiyi,πi=njNj,wi=nπi1kπk1
    • It is not hard to show that y¯w=y¯st

Weighting with nonresponses

  • If the probability of selecting unit i is πi, and the probability of response for unit i is ϕi, then P(unit i is observed)=πiϕi

  • Suppose there are r units observed (respondents). Then the weighted estimate for Y¯ is y¯w=1ri=1rwiyi,wi=r(πiϕi)1k(πkϕk)1

  • Usually ϕi is unknown and thus needs to be estimated

Weighting class estimator

  • Weighting class adjustments are used primarily to handle unit nonresponse

  • Suppose we partition the sample into J “weighting classes”. In the weighting class C=j:

    • nj: the sample size
    • rj: number of observed samples
    • A simple estimator for ϕj is ϕ^j=rjnj
  • For equal probability designs, where πi is constant, the weighting class estimator is y¯wc=1nj=1Jnjy¯jR where y¯jR is the respondent mean in class j

  • The estimate is unbiased under the following form of MAR assumption (Quasirandomization): data are MCAR within weighting class j

More about weighting class adjustments

  • Pros: handle bias with one set of weights for multivariate Y

  • Cons: weighting is inefficient and can increase in sampling variance, if Y is weakly related to the weighting class variable C

  • How to choose weighting class adjustments: weighting is only effective for outcomes (Y) that are associated with the adjustment cell variable (C). See the right column in the table below.

Propensity weighting

  • The theory of propensity scores provides a prescription for choosing the coarsest reduction of X to a weighting class variable C so that quasirandomization is roughly satisfied

  • Let X denote the variables observed for both respondents and nonrespondents

  • Suppose data are MAR, with ϕ being unknown parameters about missing mechanism P(MX,Y,ϕ)=P(MX,ϕ) Then quasirandomization is satisfied when C is chosen to be X

Response propensity stratification

  • Define response propensity for unit i as ρ(xi,ϕ)=P(mi=0ρ(xi,ϕ),ϕ) i.e., respondents are a random subsample within strata defined by the propensity score ρ(X,ϕ)

  • Usually ϕ is unknown. So a practical procedure is

    1. Estimate ϕ^ from a binary regression of M on X, based on respondent and nonrespondent data
    2. Let C be a grouped variable by coarsening ρ(X,ϕ^) into 5 or 10 values
  • Thus, within the same adjustment class, all respondents and nonrespondents have the same value of the grouped propensity score

An alternative procedure: propensity weighting

  • An alternative procedure is to weight respondents i directly by the inverse propensity score ρ(X,ϕ^)1

  • This method removes nonresponse bias

  • But it may yield estimates with extremely high sampling variance because respondents with very low estimated response propensities receive large nonresponse weights

  • Also, weighting directly by inverse propensities place may reliance on correct model specification of the regression of M on X

Example: inverse probability weighted generalized estimating equations (GEE)

  • Let xi be covariates of GEE, and zi be a fully observed vector that can predict missing mechanism

  • If P(mi=1xi,yi,zi,ϕ)=P(mi=1xi,ϕ), then the unweighted completed case GEE is unbiased i=1rDi(xi,β)[yig(xi,β)]=0

  • If P(mi=1xi,yi,zi,ϕ)=P(mi=1xi,zi,ϕ), then the inverse probability weighted GEE is unbiased i=1rwi(α^)Di(xi,β)[yig(xi,β)]=0,wi(α^)=1p(xi,ziα^) where p(xi,ziα^) is the probability of being a complete unit, based on logistic regression of mi on xi,zi

Poststratification

  • The weighting class estimator y¯wc=1nj=1Jnjy¯jR uses the sample proportion nj/n to estimate the population proportion Nj/N.

  • If from an external resource (e.g., census or a large survey), we know the population proportion of weighting classes, then we can use the post stratified mean to estimate Y¯: y¯ps=1Nj=1JNjy¯jR

Summary of weighting methods

  • Weighted CC estimates are often simple to compute, but the appropriate standard errors can be hard to compute (even asymptotically)

  • Weighting methods treat weights as fixed and known, but these nonresponse weights are computed from observed data and hence are subject to sampling uncertainty

  • Because weighted CC methods discard incomplete units and do not provide an automatic control of sampling variance, they are most useful when

    • Number of covariates is small, and
    • Sample size is large

Available-Case Analysis

Available-case (AC) analysis

  • Available-case analysis: for univariate analysis, include all unites where that variable is present

    • Sample changes from variable to variable according to the pattern of missing data
    • This is problematic if not MCAR
    • Under MCAR, AC can be used to estimate mean and variance for a single variable
  • Pairwise AC: estimates covariance of Yj and Yk based on units i where both yij and yik are observed

    • Pairwise covariance estimator: sjk(jk)=iIjk(yijy¯j(jk))(yiky¯k(jk))/(n(jk)1) where Ijk is the set of n(jk) units with both Yj and Yk observed

Problems with pairwise AC estimators on correlation

  • Correlation estimator 1: rjk=sjk(jk)sjj(j)skk(k)

    • Problem: it can lie outside of (1,1)
  • Correlation estimator 2 corrects the previous problem: rjk(jk)=sjk(jk)sjj(jk)skk(jk)

  • Under MCAR, all these estimators on covariance and correlation are consistent

  • However, when K>3, both correlation estimators can yield correlation matrices that are not positive definite!

    • An extreme example: r12=1,r13=1,r23=1

Compare CC and AC methods

  • When data is MCAR and correlations are mild, AC methods are more efficient than CC

  • When correlations are large, CC methods are usually better

References

  • Little, R. J., & Rubin, D. B. (2019). Statistical Analysis with Missing Data, 3rd Edition. John Wiley & Sons.