Model Selection Criteria for Missing-Data Problems Using the EM Algorithm

作者:Ibrahim, Joseph G*; Zhu, Hongtu; Tang, Niansheng
来源:Journal of the American Statistical Association, 2008, 103(484): 1648-1658.
DOI:10.1198/016214508000001057

摘要

We consider novel methods for the Computation of model selection criteria in missing-data problems based on the output of the EM algorithm The methodology is very general and can be applied to numerous simulations involving incomplete data within an EM framework, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Toward this goal, we develop a class of information criteria for missing-data problems called ICH,Q, which yields the Akaike information criterion and the Bayesian information criterion as special cases. The computation of ICH,Q requires an analytic approximation to a complicated function. called the H-function, along with output from the EM algorithm used in obtaining maximum likelihood estimates. The approximation to the H-function leads to a large class of information criteria, called IC(H) over tilde (k),Q. Theoretical properties of IC(H) over tilde (k),Q, including consistency, are investigated in detail. To eliminate the analytic approximation to the H-function, a computationally simpler approximation to ICH,Q. called ICQ, is proposed, the computation of which depends solely on the Q-function of the EM algorithm. Advantages and disadvantages of IC(H) over tilde (k),Q and ICQ are discussed and examined in detail in the context of missing-data problems. Extensive simulations are given to demonstrate the methodology and examine the small-sample and large-sample performance of IC(H) over tilde (k),Q and ICQ in missing-data problems. An AIDS data set also is presented to illustrate the proposed methodology.