In this paper the performance of bagging in classification problems is theoretically analysed, using a framework developed in works by Tumer and Ghosh and extended by the authors. A bias-variance decomposition is derived, which relates the expected misclassification probability attained by linearly combining classifiers trained on N bootstrap replicates of a fixed training set to that attained by a single bootstrap replicate of the same training set. Theoretical results show that the expected misclassification probability of bagging has the same bias component as a single bootstrap replicate, while the variance component is reduced by a factor N. Experimental results show that the performance of bagging as a function of the number of bootstrap replicates follows quite well our theoretical prediction. It is finally shown that theoretical results derived for bagging also apply to other methods for constructing multiple classifiers based on randomisation, such as the random subspace method and tree randomisation.

Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation

FUMERA, GIORGIO;ROLI, FABIO;
2005-01-01

Abstract

In this paper the performance of bagging in classification problems is theoretically analysed, using a framework developed in works by Tumer and Ghosh and extended by the authors. A bias-variance decomposition is derived, which relates the expected misclassification probability attained by linearly combining classifiers trained on N bootstrap replicates of a fixed training set to that attained by a single bootstrap replicate of the same training set. Theoretical results show that the expected misclassification probability of bagging has the same bias component as a single bootstrap replicate, while the variance component is reduced by a factor N. Experimental results show that the performance of bagging as a function of the number of bootstrap replicates follows quite well our theoretical prediction. It is finally shown that theoretical results derived for bagging also apply to other methods for constructing multiple classifiers based on randomisation, such as the random subspace method and tree randomisation.
2005
978-3-540-26306-7
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/15431
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 9
social impact