We propose the calibrated posterior predictive p-value (cppp) as an interpretable goodness-of-fit (GOF) measure for regression models in sequential regression multiple imputation (SRMI). The cppp is uniformly distributed under the assumed model, while the posterior predictive p-value (ppp) is not in general and in particular when the percentage of missing data, pm, increases. Uniformity of cppp allows the analyst to evaluate properly the evidence against the assumed model. We show the advantages of cppp over ppp in terms of power in detecting common departures from the assumed model and, more importantly, in terms of robustness with respect to pm. In the imputation phase, which provides a complete database for general statistical analyses, default and improper priors are usually needed, whereas the cppp requires a proper prior on regression parameters. We avoid this problem by introducing the use of a minimum training sample that turns the improper prior into a proper distribution. The dependency on the training sample is naturally accounted for by changing the training sample at each step of the SRMI. Our results come from theoretical considerations together with simulation studies and an application to a real data set of anthropometric measures.

Goodness-of-Fit of Conditional Regression Models for Multiple Imputation

CABRAS, STEFANO;Castellanos Nueda ME;
2011

Abstract

We propose the calibrated posterior predictive p-value (cppp) as an interpretable goodness-of-fit (GOF) measure for regression models in sequential regression multiple imputation (SRMI). The cppp is uniformly distributed under the assumed model, while the posterior predictive p-value (ppp) is not in general and in particular when the percentage of missing data, pm, increases. Uniformity of cppp allows the analyst to evaluate properly the evidence against the assumed model. We show the advantages of cppp over ppp in terms of power in detecting common departures from the assumed model and, more importantly, in terms of robustness with respect to pm. In the imputation phase, which provides a complete database for general statistical analyses, default and improper priors are usually needed, whereas the cppp requires a proper prior on regression parameters. We avoid this problem by introducing the use of a minimum training sample that turns the improper prior into a proper distribution. The dependency on the training sample is naturally accounted for by changing the training sample at each step of the SRMI. Our results come from theoretical considerations together with simulation studies and an application to a real data set of anthropometric measures.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11584/246
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact