The continuous growth of available document collections makes text categorization a challenging task to be investigated. Real-world scenarios are typically characterized by a huge amount of non relevant documents with respect to the documents a user is really looking for. In this paper, we investigate how the ratio between relevant and non relevant documents affects the performances of a classifier system. In particular, to counterbalance the negative impact of imbalanced inputs, we propose a novel progressive filtering technique. Performed on the RCV1-v2 benchmark, experiments confirm the validity of the approach.

A Progressive Filtering Approach to Hierarchical Text Categorization

ARMANO, GIULIANO;VARGIU, ELOISA
2008-01-01

Abstract

The continuous growth of available document collections makes text categorization a challenging task to be investigated. Real-world scenarios are typically characterized by a huge amount of non relevant documents with respect to the documents a user is really looking for. In this paper, we investigate how the ratio between relevant and non relevant documents affects the performances of a classifier system. In particular, to counterbalance the negative impact of imbalanced inputs, we propose a novel progressive filtering technique. Performed on the RCV1-v2 benchmark, experiments confirm the validity of the approach.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/108945
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact