The continuous growth of available document collections makes text categorization a challenging task to be investigated. Real-world scenarios are typically characterized by a huge amount of non relevant documents with respect to the documents a user is really looking for. In this paper, we investigate how the ratio between relevant and non relevant documents affects the performances of a classifier system. In particular, to counterbalance the negative impact of imbalanced inputs, we propose a novel progressive filtering technique. Performed on the RCV1-v2 benchmark, experiments confirm the validity of the approach.
A Progressive Filtering Approach to Hierarchical Text Categorization
ARMANO, GIULIANO;VARGIU, ELOISA
2008-01-01
Abstract
The continuous growth of available document collections makes text categorization a challenging task to be investigated. Real-world scenarios are typically characterized by a huge amount of non relevant documents with respect to the documents a user is really looking for. In this paper, we investigate how the ratio between relevant and non relevant documents affects the performances of a classifier system. In particular, to counterbalance the negative impact of imbalanced inputs, we propose a novel progressive filtering technique. Performed on the RCV1-v2 benchmark, experiments confirm the validity of the approach.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.