Models for Imputing Missing Data, Including Methods for Assessing Sensitivity of Conclusions to Them
Speaker: Donald Rubin, John L. Loeb Professor Statistics, Harvard University
Abstract: There are two relatively standard approaches for dealing with missing data in statistics, one based on “selection models” and one based on “pattern-mixture" models. The former is focused on formulating a model for complete data and then effectively imputing missing data so that the combined observed and missing data fit the assumed model for the complete data. In contrast, the latter effectively fits a different model for each pattern of observed and missing data, thereby directly revealing sensitivity of conclusions to assumptions about distributions for which there are no actual observed data available for estimation. A third class of models, which have remained mostly recondite, is based on “Gibbs” factorizations; although these may not imply a valid joint distribution, they have enjoyed success in applications because of their ease of use when implemented by MCMC computer software for multiple imputation, such as in SAS, STATA, and MICE. The consideration of sensitivity of conclusions to assumptions unassailable by observed data, whether implicit, as with selection models, or explicit, as with pattern-mixture models, is a critical ingredient of satisfactory analyses of data sets with missing values. Graphical displays, such as “enhanced tipping point analyses” implemented using modern computing, are critical ingredients for this enterprise.