When multiple imputation is better than maximum likelihood

by Paul von Hippel

My colleague Paul Allison has written a provocative post titled “Why maximum likelihood is better than multiple imputation.” In support of his title, he points out that maximum likelihood has the following advantages:

  • Maximum likelihood is faster and more efficient than multiple imputation.
  • Maximum likelihood presents users with fewer choices to make — and fewer ways to screw up.
  • Maximum likelihood produces the same result every time you run it. By contrast, multiple imputation estimates vary from one run to another — although with enough imputations you can get that variation down to an acceptable level.

These are excellent points. Nevertheless, multiple imputation is not going away. Because multiple imputation is much more flexible, and can be applied in more situations than maximum likelihood.

Maximum likelihood has fairly narrow requirements. First, and most obviously, you have to estimate your model using maximum likelihood. If you want to use any estimation method that doesn’t just maximize a likelihood — e.g., if you’re using method of moments, or weighted least squares, or clustered standard errors — you have to use multiple imputation.

Second, maximum likelihood is only good for estimating model parameters and quantities that can be derived easily from model parameters. If you want to estimate some other quantity, maximum likelihood may be inconvenient or even unusable. For example, you might want to estimate the percentage of schools whose test scores exceed a certain threshold, or the percentage of wives who outearn their husbands. In multiply imputed data, you can do that the same way you’d do it in complete data: with simple descriptive statistics. But with maximum likelihood, you’d have to figure out how those quantities depended on the model parameters.

And you’d have to really believe your parametric model. Take the example of wives who outearn their husbands. Using maximum likelihood, you’d need some parametric model for the joint distribution of husbands’ and wives’ incomes. But most parametric models fit empirical income distributions poorly (von Hippel, Hunter, & Drown 2017) — and the worst fitting distributions (e.g., the normal, the lognormal) are the ones most commonly used for maximum likelihood with incomplete data.

If you calculate the percentage of wives outearning their husbands from a distribution that fits badly, you’re going to get a biased answer. That’s true even in complete data. Having incomplete data doesn’t help.

To be sure, often multiple imputation would also use an unrealistic parametric model for the joint distribution of incomes (Schafer 1997). But it doesn’t matter as much. Because in multiple imputation, you only use the parametric model to impute missing incomes. In the imputed data, the observed incomes will still follow their empirical distribution. When you calculate the percentage of wives who outearn their husbands in imputed data, you just calculate descriptive statistics from the mix of observed and imputed values, without assuming any particular distribution. Your answer will be more accurate, especially if few values are imputed.

And multiple imputation doesn’t have to use a parametric model. It could impute nonparametrically, imputing missing wives’ incomes by sampling from the observed incomes of wives with similar husbands and other characteristics (Andridge & Little, 2010). Nonparametric imputation doesn’t impose an unrealistic distribution on the imputed data.

Even when it’s used parametrically, multiple imputation is more flexible. Maximum likelihood requires a parametric model for the joint distribution of all the incomplete variables. But multiple imputation can impute each variable by regressing it on the others — using a linear regression to impute a continuous variable, a logistic regression to impute a dummy variable, and a Poisson regression to impute a count (Raghunathan, Lepkowski, Van Hoewyk, & Solenberger, 2001; van Buuren & Oudshoorn, 1999). This approach is sometimes criticized because it’s not clear what the joint distribution is. But often you don’t know the joint distribution anyway, and it would be a mess to specify if you did.

In short, when the requirements for maximum likelihood are met, it really is a better method. But in many settings, maximum likelihood’s requirements aren’t met, and multiple imputation is a better option. And in some settings, maximum likelihood can’t be used, and multiple imputation is the only option.

References

Andridge, R. R., & Little, R. J. A. (2010). A Review of Hot Deck Imputation for Survey
Non‐response. International Statistical Review, 78(1), 40–64.

Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J., & Solenberger, P. W. (2001). A
Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of
Regression Models. Survey Methodology, 27(1), 85–95.

Schafer, J. L. (1997). Analysis of incomplete multivariate data. London; New York:
Chapman & Hall.

Van Buuren, S., & Oudshoorn, K. (1999). Flexible multivariate imputation by MICE (TNO
report No. PG/VGZ/99.054). Leiden, Netherlands.

von Hippel, P.T., Hunter, D.J., & Drown, M. (2017). “Better estimates from binned incomes: Interpolated CDFs and mean-matching.” Sociological Science 4, 641-655.