Introduction      Types of Mistakes        Suggestions       Resources       Table of Contents        About    Glossary    Blog

Dealing with Missing Data

Many methods have been proposed for dealing with missing data1, but these typically make assumptions that may be difficult or impossible to verify. Michael Daniels and Joseph Hogan summarize some of the problems as follows:

"When data are incomplete, inference about parameters of interest cannot be carried out without the benefit of subjecive assumptions about the distribution of missing responses. They are subjective because data cannot be used to critique them. Some of these assumptions are used with such regularity that we forget they are being made; for example, when commercial software such as SAS or Stat is used to analyze incomplete longitudinal data using a random effects model, the missing at random (MAR) assumption is being used; when the Kaplan-Meier estimator is used to summarize a survival curve from censored event times, non-informative censoring is being assumed. Neither assumption can be formally checked, so the validity of inferences relies on subjective judgment."2

Thus dealing with missing data is a real problem in statistics. There are at least a couple of types of active research in this area:


1. See C. K. Enders and A. C.  Gottschall (2011). The Impact of Missing Data on the Ethical Quality of a Research Study,  Chapter 14 in A.T. Panter and S. K. Sterba, Handbook of Ethics in Quantitative Methodology, Routledge for discussion of some such methods.

. M. Daniels and J. Hogan (2008). Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis, Chapman and Hall/CRC, pp. xvii - xviii.

3. See Note 3

4. See, for example, D. Madigan and P. Ryan (2011), What can we really learn from observational studies? Epidemiology vol 22, pp. 629 - 631, available at,44&cluster=5198548572751487812 

Last updated February 4, 2013