We developed three different methods for dataset analysis and ensemble enhance- ment. They share the underlying idea that an accurate preprocessing and adap- tation of the data can improve the system performance, without changing the classification model. Correlation Score is a generic framework for assessing encoding techniques by measuring the correlation between the encoded feature vectors and the corresponding class labels; experiments show its effectiveness in discovering the best encoding configurations between those tested, on a wide range of classification domains. Multi-Resolution Complexity Analysis is a method for assessing the local complexity inside a given domain. It is able to split a domain into regions of different classification complexity, giving insights on the inner structure of the populations inside the domain. Finally, Forests of Local Trees are a novel training algorithm for ensemble classifiers. They are based on the concept of local trees: classifiers trained with a bias toward a certain region of the domain. This bias enhances the diversity inside the ensemble, leading to improved performance. These three topics are meant as a foundation for a more complex framework, that will eventually utilize them organically.

Dataset analysis for classifier ensemble enhancement

TAMPONI, EMANUELE
2015-04-27

Abstract

We developed three different methods for dataset analysis and ensemble enhance- ment. They share the underlying idea that an accurate preprocessing and adap- tation of the data can improve the system performance, without changing the classification model. Correlation Score is a generic framework for assessing encoding techniques by measuring the correlation between the encoded feature vectors and the corresponding class labels; experiments show its effectiveness in discovering the best encoding configurations between those tested, on a wide range of classification domains. Multi-Resolution Complexity Analysis is a method for assessing the local complexity inside a given domain. It is able to split a domain into regions of different classification complexity, giving insights on the inner structure of the populations inside the domain. Finally, Forests of Local Trees are a novel training algorithm for ensemble classifiers. They are based on the concept of local trees: classifiers trained with a bias toward a certain region of the domain. This bias enhances the diversity inside the ensemble, leading to improved performance. These three topics are meant as a foundation for a more complex framework, that will eventually utilize them organically.
27-apr-2015
classification
complexity
correlation
data mining
diversity
ensemble
preprocessing
random forest
File in questo prodotto:
File Dimensione Formato  
PHD_Thesis_Tamponi.pdf

accesso aperto

Tipologia: Tesi di dottorato
Dimensione 1.76 MB
Formato Adobe PDF
1.76 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/266597
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact