摘要

To facilitate checking and improvement of a Bayesian model, we define an outlier as an observation or group of observations that is "surprising" relative to its predictive distribution, under the model, given the remainder of the data. Hence outlyingness can be measured by the posterior predictive case-deleted p-value of any interesting scalar summary of the (possibly multivariate) observation. It is also sometimes useful to condition on a part of the data for the potentially outlying case, such as the pattern of missing data, thus defining a conditional outlier.
When parameters of interest have been sampled from their posterior distribution, the case-deleted p-value can be calculated by reweighting the sample to reflect deletion of the target observation and then drawing from the predictive distribution of the observation, facilitating huge computational savings. One efficient extension to the basic reweighting approach exploits the conditional independence structure of most hierarchical models. Another extension uses weighted systematic sampling to accommodate the dependence structure of the MCMC sample.
Outlier checks are illustrated in hierarchical models for two datasets, a standard linear hierarchical model for rat growth and a complex ordinal model for survey data with nonignorably missing responses.
Sample code and data for the rat growth example are available as an on-line supplement.

  • 出版日期2010-12