Randomization-based techniques for classifier ensemble construction, like Bagging and Random Forests, are well known and widely used. They consist of independently training the ensemble members on random perturbations of the training data or random changes of the learning algorithm. We argue that randomization techniques can be defined also by directly manipulating the parameters of the base classifier, i.e., by sampling their values from a given probability distribution. A classifier ensemble can thus be built without manipulating the training data or the learning algorithm, and then running the learning algorithm to obtain the individual classifiers. The key issue is to define a suitable parameter distribution for a given base classifier. This also allows one to re-implement existing randomization techniques by sampling the classifier parameters from the distribution implicitly defined by such techniques, if it is known or can be approximated, instead of explicitly manipulating the training data and running the learning algorithm. In this work we provide a first investigation of our approach, starting from an existing randomization technique (Bagging): we analytically approximate the parameter distribution for three well-known classifiers (nearest-mean, linear and quadratic discriminant), and empirically show that it generates ensembles very similar to Bagging. We also give a first example of the definition of a novel randomization technique based on our approach.
A parameter randomization approach for constructing classifier ensembles
SANTUCCI, ENRICA
;DIDACI, LUCA;FUMERA, GIORGIO;ROLI, FABIO
2017-01-01
Abstract
Randomization-based techniques for classifier ensemble construction, like Bagging and Random Forests, are well known and widely used. They consist of independently training the ensemble members on random perturbations of the training data or random changes of the learning algorithm. We argue that randomization techniques can be defined also by directly manipulating the parameters of the base classifier, i.e., by sampling their values from a given probability distribution. A classifier ensemble can thus be built without manipulating the training data or the learning algorithm, and then running the learning algorithm to obtain the individual classifiers. The key issue is to define a suitable parameter distribution for a given base classifier. This also allows one to re-implement existing randomization techniques by sampling the classifier parameters from the distribution implicitly defined by such techniques, if it is known or can be approximated, instead of explicitly manipulating the training data and running the learning algorithm. In this work we provide a first investigation of our approach, starting from an existing randomization technique (Bagging): we analytically approximate the parameter distribution for three well-known classifiers (nearest-mean, linear and quadratic discriminant), and empirically show that it generates ensembles very similar to Bagging. We also give a first example of the definition of a novel randomization technique based on our approach.File | Dimensione | Formato | |
---|---|---|---|
paper.pdf
Solo gestori archivio
Descrizione: Articolo completo
Tipologia:
versione post-print (AAM)
Dimensione
376.71 kB
Formato
Adobe PDF
|
376.71 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.