The evaluation of the intrinsic complexity of a supervised domain plays an important role in devising classification systems. Typically, the metrics used for this purpose produce an overall evaluation of the domain, without localizing the sources of complexity. In this work we propose a method for partitioning the feature space into subsets of different complexity. The most important outcome of the method is the possibility of preliminarily identifying hard and easy regions of the feature space. This possibility opens interesting theoretical and pragmatic scenarios, including the analysis of the classification error and the implementation of robust classification systems. A first group of experiments has been performed on synthetic datasets, devised to separately highlight specific and recurrent problems often found in real-world domains. In particular, the focus has been on class boundaries, noise, and density of samples. A second group of experiments, performed on selected real-world datasets, confirm the validity of the proposed method. The ultimate goal of our research is to devise a method for estimating the classification difficulty of a dataset. The proposed method makes a significant step in this direction, as it is able to partition a given dataset according to the inherent complexity of the samples contained therein.

A Novel Method for Partitioning Feature Spaces According to their Inherent Classification Complexity

ARMANO, GIULIANO;
2013-01-01

Abstract

The evaluation of the intrinsic complexity of a supervised domain plays an important role in devising classification systems. Typically, the metrics used for this purpose produce an overall evaluation of the domain, without localizing the sources of complexity. In this work we propose a method for partitioning the feature space into subsets of different complexity. The most important outcome of the method is the possibility of preliminarily identifying hard and easy regions of the feature space. This possibility opens interesting theoretical and pragmatic scenarios, including the analysis of the classification error and the implementation of robust classification systems. A first group of experiments has been performed on synthetic datasets, devised to separately highlight specific and recurrent problems often found in real-world domains. In particular, the focus has been on class boundaries, noise, and density of samples. A second group of experiments, performed on selected real-world datasets, confirm the validity of the proposed method. The ultimate goal of our research is to devise a method for estimating the classification difficulty of a dataset. The proposed method makes a significant step in this direction, as it is able to partition a given dataset according to the inherent complexity of the samples contained therein.
2013
feature space partitioning, multiresolution analysis, machine learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/52893
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact