We developed three different methods for dataset analysis and ensemble enhance- ment. They share the underlying idea that an accurate preprocessing and adap- tation of the data can improve the system performance, without changing the classification model. Correlation Score is a generic framework for assessing encoding techniques by measuring the correlation between the encoded feature vectors and the corresponding class labels; experiments show its effectiveness in discovering the best encoding configurations between those tested, on a wide range of classification domains. Multi-Resolution Complexity Analysis is a method for assessing the local complexity inside a given domain. It is able to split a domain into regions of different classification complexity, giving insights on the inner structure of the populations inside the domain. Finally, Forests of Local Trees are a novel training algorithm for ensemble classifiers. They are based on the concept of local trees: classifiers trained with a bias toward a certain region of the domain. This bias enhances the diversity inside the ensemble, leading to improved performance. These three topics are meant as a foundation for a more complex framework, that will eventually utilize them organically.
Dataset analysis for classifier ensemble enhancement
TAMPONI, EMANUELE
2015-04-27
Abstract
We developed three different methods for dataset analysis and ensemble enhance- ment. They share the underlying idea that an accurate preprocessing and adap- tation of the data can improve the system performance, without changing the classification model. Correlation Score is a generic framework for assessing encoding techniques by measuring the correlation between the encoded feature vectors and the corresponding class labels; experiments show its effectiveness in discovering the best encoding configurations between those tested, on a wide range of classification domains. Multi-Resolution Complexity Analysis is a method for assessing the local complexity inside a given domain. It is able to split a domain into regions of different classification complexity, giving insights on the inner structure of the populations inside the domain. Finally, Forests of Local Trees are a novel training algorithm for ensemble classifiers. They are based on the concept of local trees: classifiers trained with a bias toward a certain region of the domain. This bias enhances the diversity inside the ensemble, leading to improved performance. These three topics are meant as a foundation for a more complex framework, that will eventually utilize them organically.File | Dimensione | Formato | |
---|---|---|---|
PHD_Thesis_Tamponi.pdf
accesso aperto
Tipologia:
Tesi di dottorato
Dimensione
1.76 MB
Formato
Adobe PDF
|
1.76 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.