In the last ten years, automatic Text Categorization (TC) has been gaining an increasing interest from the research community, due to the need to organize a massive number of digital documents. Following a machine learning paradigm, this paper presents a model which regards TC as a classification task supported by a wrapper approach and combines the utilization of a Genetic Algorithm (GA) with a filter. First, a filter is used to weigh the relevance of terms in documents. Then, the top-ranked terms are grouped in several nested sets of relatively small size. These sets are explored by a GA which extracts the subset of terms that best categorize documents. Experimental results on the Reuters-21578 dataset state the effectiveness of the proposed model and its competitiveness with the learning approaches proposed in the TC literature.

A model for term selection in text categorization problems

DESSI, NICOLETTA;DESSI', STEFANIA
2012-01-01

Abstract

In the last ten years, automatic Text Categorization (TC) has been gaining an increasing interest from the research community, due to the need to organize a massive number of digital documents. Following a machine learning paradigm, this paper presents a model which regards TC as a classification task supported by a wrapper approach and combines the utilization of a Genetic Algorithm (GA) with a filter. First, a filter is used to weigh the relevance of terms in documents. Then, the top-ranked terms are grouped in several nested sets of relatively small size. These sets are explored by a GA which extracts the subset of terms that best categorize documents. Experimental results on the Reuters-21578 dataset state the effectiveness of the proposed model and its competitiveness with the learning approaches proposed in the TC literature.
2012
978-1-4673-2621-6
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/105163
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact