Goodness-of-Fit of Conditional Regression Models for Multiple Imputation

Cabras, Stefano; Castellanos Nueda, Maria Eugenia; Quiros, A.

We propose the calibrated posterior predictive p-value (cppp) as an interpretable goodness-of-fit (GOF) measure for regression models in sequential regression multiple imputation (SRMI). The cppp is uniformly distributed under the assumed model, while the posterior predictive p-value (ppp) is not in general and in particular when the percentage of missing data, pm, increases. Uniformity of cppp allows the analyst to evaluate properly the evidence against the assumed model. We show the advantages of cppp over ppp in terms of power in detecting common departures from the assumed model and, more importantly, in terms of robustness with respect to pm. In the imputation phase, which provides a complete database for general statistical analyses, default and improper priors are usually needed, whereas the cppp requires a proper prior on regression parameters. We avoid this problem by introducing the use of a minimum training sample that turns the improper prior into a proper distribution. The dependency on the training sample is naturally accounted for by changing the training sample at each step of the SRMI. Our results come from theoretical considerations together with simulation studies and an application to a real data set of anthropometric measures.