UNICA IRIS Institutional Research Information System

Ensemble classification is a well-established approach that involves fusing the decisions of multiple predictive models. A similar “ensemble logic” has been recently applied to challenging feature selection tasks aimed at identifying the most informative variables (or features) for a given domain of interest. In this work, we discuss the rationale of ensemble feature selection and evaluate the effects and the implications of a specific ensemble approach, namely the data perturbation strategy. Basically, it consists in combining multiple selectors that exploit the same core algorithm but are trained on different perturbed versions of the original data. The real potential of this approach, still object of debate in the feature selection literature, is here investigated in conjunction with different kinds of core selection algorithms (both univariate and multivariate). In particular, we evaluate the extent to which the ensemble implementation improves the overall performance of the selection process, in terms of predictive accuracy and stability (i.e., robustness with respect to changes in the training data). Furthermore, we measure the impact of the ensemble approach on the final selection outcome, i.e. on the composition of the selected feature subsets. The results obtained on ten public genomic benchmarks provide useful insight on both the benefits and the limitations of such ensemble approach, paving the way to the exploration of new and wider ensemble schemes.

Exploiting the ensemble paradigm for stable feature selection: A case study on high-dimensional genomic data

PES, BARBARA^Primo;DESSI, NICOLETTA;Angioni, Marta

2017-01-01

Abstract

Ensemble classification is a well-established approach that involves fusing the decisions of multiple predictive models. A similar “ensemble logic” has been recently applied to challenging feature selection tasks aimed at identifying the most informative variables (or features) for a given domain of interest. In this work, we discuss the rationale of ensemble feature selection and evaluate the effects and the implications of a specific ensemble approach, namely the data perturbation strategy. Basically, it consists in combining multiple selectors that exploit the same core algorithm but are trained on different perturbed versions of the original data. The real potential of this approach, still object of debate in the feature selection literature, is here investigated in conjunction with different kinds of core selection algorithms (both univariate and multivariate). In particular, we evaluate the extent to which the ensemble implementation improves the overall performance of the selection process, in terms of predictive accuracy and stability (i.e., robustness with respect to changes in the training data). Furthermore, we measure the impact of the ensemble approach on the final selection outcome, i.e. on the composition of the selected feature subsets. The results obtained on ten public genomic benchmarks provide useful insight on both the benefits and the limitations of such ensemble approach, paving the way to the exploration of new and wider ensemble schemes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
			2017
		
	Parole chiave
	
			ensemble paradigm; feature selection; data perturbation; selection stability; high-dimensional genomic data
		
	Tipologia:
	
			1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
INFFUS_2016.pdf Solo gestori archivio Descrizione: Articolo principale Tipologia: versione editoriale Dimensione 1.65 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.65 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
INFFUS2017_eprint_cc.pdf accesso aperto Descrizione: Articolo principale Tipologia: versione post-print Dimensione 754.47 kB Formato Adobe PDF Visualizza/Apri	754.47 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/185225

Citazioni

ND

92

73

social impact