Why maximum likelihood is better than multiple imputation

by Paul Allison
This post first appeared at statisticalhorizons.com in 2012.

I’ve long been an advocate of multiple imputation for handling missing data. For example, in my two-day Missing Data seminar, I spend about two-thirds of the course on multiple imputation, using PROC MI in SAS and the mi command in Stata. The other third covers maximum likelihood (ML).  Both methods are pretty good, especially when compared with more traditional methods like listwise deletion or conventional imputation. ML and multiple imputation make similar assumptions, and they have similar statistical properties.

The reason I spend more time talking about multiple imputation is not that I prefer it.  On the contrary, I prefer to use maximum likelihood to handle missing data whenever possible. One reason is that ML is simpler, at least if you have the right software.  And that’s why I spend more time on multiple imputation, because it takes more time to explain all the different ways to do it and all the little things you have to keep track of and be careful about.

The other big problem with multiple imputation is that, to be effective, your imputation model has to be “congenial” with your analysis model. The two models don’t have to be identical, but they can’t have major inconsistencies. And there are lots of ways that they can be inconsistent. For example, if your analysis model has interactions, then your imputation model better have them as well. If your analysis model uses a transformed version of a variable, your imputation model should use the same transformation. That’s not an issue with ML because everything is done under a single model.

One other attraction of ML is that it produces a deterministic result. By contrast, multiple imputation gives you a different result every time you run it because random draws are a crucial part of the process.  You can reduce that variability as much as you want by imputing more data sets, and Paul von Hippel has recently proposed a way to decide how many are enough. But there’s always going to be some variability with MI. With ML there’s none.

The catch with ML is that you need specially designed software to implement it.  Fortunately, in recent years several major statistical packages have introduced methods for handling missing data by ML.  For example, the default in most mixed modeling software (like PROC MIXED in SAS or the xtmixed command in Stata) is to use ML to handle missing data on the response variable. For linear models with missing data on predictors, there are now easy-to-use implementations of ML in both SAS (PROC CALIS) and Stata (the sem command). For logistic regression and Cox regression, the only commercial package that does ML for missing data is Mplus.

To get the whole story, you can download the paper that accompanied my keynote address at the 2012 SAS Global Forum.

Paul von Hippel has written a rejoinder to this post titled “When Multiple Imputation is Better than Maximum Likelihood.”