*For the pdf slides, click here*

# Concepts of MCAR, MAR, MNAR

### Concepts of MCAR, MAR, MNAR

- Missing completely at random (MCAR): the probability of being missing is the same for all cases
- Cause of missing is unrelated to the data

- Missing at random (MAR): the probability of being missing only depends on the observed data
- Cause of missing is unrelated to the missing values

- Missing not at random (MNAR): probability of being missing depends on the missing values themselves

# Ad-hoc Solutions

### Listwise deletion and pairwise deletion

- Listwise deletion (also called complete-case analysis): delete rows which contain one or more missing values
- If data is MCAR, listwise deletion produces unbiased estimates of means, variances, and regression weights (if need to train a predictive model)
- If data is not MCAR, listwise deletion can severely bias the above estimates.

- Pairwise deletion (also called available-case analysis)
- Mean and variance of variable \(X\) are based on all cases with observed data on \(X\)
- Covariance and correlation of \(X\) and \(Y\) is based on all data which both \(X\) and \(Y\) have non-missing values

### Mean imputation

- Compared with the observed data, in the imputed data (observed + imputed values)
- Standard deviations decrease
- Correlation decreases
- Means can be biased if the data is not MCAR.

### Regression imputation

- Build a regression model from the observed data
- Impute the missing values in the response variable with the predicted values from the fitted regression

- The impute values are the most likely values under the model
- However, it decreases the variance of the target variable
- And it increases the correlations between the target and covariates

- Regression imputation, and its modern incarnations in machine learning is probably the most dangerous of all ad-hoc methods

### Stochastic regression imputation

- Build a regression model from the observed data
- Impute a missing value in the response variable with the predicted value
*plus a random draw from the residual*

- Preserves variance and correlation.
- Imputed values can exceed the range (e.g., a negative Ozone level). A more suitable model may resolve this.

### LOCF and BOCF

Last observation carried forward (LOCF) and baseline observation carried forward (BOCF) are for longitudinal data.

LOCF can yield biased estimation even under MCAR.

### Indicator method

- Not for imputation, but for building predictive models
- Only works for missing in covariates, not the target variables

### Summary of ad-hoc imputation methods

- Note: the unbiasness of regression coefficients are assess with the variable containing missing values as the target variable

# Multiple Imputation in a Nutshell

### Multiple imputation creates \(m>1\) complete datasets

- Three steps of multiple imputation
- Imputation
- Analysis: train separate models
- Pooling: variance among \(m\) parameter estimates combines the conventional sampling variance (within-imputation variance) and the extra variance caused by the missing data (between-imputation variance)

### Why using multiple imputation?

- It provides a mechanism to deal with the inherent uncertainty of the imputations
- It separate the solution of the missing data problem from the solution of the complete-data problem (train predictive models on complete data)

### Multiple imputation example using the mice package

```
## Load the mice package
library(mice);
## Impute 20 times, using preditive mean matching
imp <- mice(airquality, seed = 1, m = 20, print = FALSE)
## Fit linear regressions
fit <- with(imp, lm(Ozone ~ Wind + Temp + Solar.R))
## Pooled regression estimates
pander(summary(pool(fit)))
```

term | estimate | std.error | statistic | df | p.value |
---|---|---|---|---|---|

(Intercept) | -60.21 | 21.57 | -2.791 | 100.3 | 0.006 |

Wind | -3.174 | 0.644 | -4.927 | 83.29 | 0 |

Temp | 1.584 | 0.228 | 6.959 | 125.7 | 0 |

Solar.R | 0.058 | 0.023 | 2.454 | 79.63 | 0.016 |

### References

Van Buuren, S. (2018). Flexible Imputation of Missing Data, 2nd Edition. CRC press.