Selecting a subset of relevant features is crucial to the analysis of high-dimensional datasets coming from a number of application domains, such as biomedical data, document and image analysis. Since no single selection algorithm seems to be capable of ensuring optimal results in terms of both predictive performance and stability (i.e. robustness to changes in the input data), researchers have increasingly explored the effectiveness of “ensemble” approaches involving the combination of different selectors. While interesting proposals have been reported in the literature, most of them have been so far evaluated in a limited number of settings (e.g. with data from a single domain and in conjunction with specific selection approaches), leaving unanswered important questions about the large-scale applicability and utility of ensemble feature selection. To give a contribution to the field, this work presents an empirical study which encompasses different kinds of selection algorithms (filters and embedded methods, univariate and multivariate techniques) and different application domains. Specifically, we consider 18 classification tasks with heterogeneous characteristics (in terms of number of classes and instances-to-features ratio) and experimentally evaluate, for feature subsets of different cardinalities, the extent to which an ensemble approach turns out to be more robust than a single selector, thus providing useful insight for both researchers and practitioners.

Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains

Pes, Barbara
Primo
2020-01-01

Abstract

Selecting a subset of relevant features is crucial to the analysis of high-dimensional datasets coming from a number of application domains, such as biomedical data, document and image analysis. Since no single selection algorithm seems to be capable of ensuring optimal results in terms of both predictive performance and stability (i.e. robustness to changes in the input data), researchers have increasingly explored the effectiveness of “ensemble” approaches involving the combination of different selectors. While interesting proposals have been reported in the literature, most of them have been so far evaluated in a limited number of settings (e.g. with data from a single domain and in conjunction with specific selection approaches), leaving unanswered important questions about the large-scale applicability and utility of ensemble feature selection. To give a contribution to the field, this work presents an empirical study which encompasses different kinds of selection algorithms (filters and embedded methods, univariate and multivariate techniques) and different application domains. Specifically, we consider 18 classification tasks with heterogeneous characteristics (in terms of number of classes and instances-to-features ratio) and experimentally evaluate, for feature subsets of different cardinalities, the extent to which an ensemble approach turns out to be more robust than a single selector, thus providing useful insight for both researchers and practitioners.
2020
Ensemble approaches; Feature selection; High-dimensional data analysis; Stability of feature selection algorithms
File in questo prodotto:
File Dimensione Formato  
Pes2020_Article_EnsembleFeatureSelectionForHig.pdf

accesso aperto

Descrizione: Articolo principale
Tipologia: versione editoriale (VoR)
Dimensione 4.29 MB
Formato Adobe PDF
4.29 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/262305
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 158
  • ???jsp.display-item.citation.isi??? 108
social impact