Available document collections are more and more required for supervised text categorization tasks. They typically are collections of documents classified by domain engineers. In this paper, we propose a semantic text categorization approach able to automatically create document collections in which documents are classified according to WordNet Domains taxonomy. Experiments have been performed by training a classifier with an automatic document collection and comparing results with those obtained by training the same classifier on a hand-made document collection. Experimental results point out that, on average, the performances of the automatic approach are quite similar to those obtained on a document collection classified by domain engineers.
A Novel Semantic Approach to Document Collections
ARMANO, GIULIANO;VARGIU, ELOISA
2009-01-01
Abstract
Available document collections are more and more required for supervised text categorization tasks. They typically are collections of documents classified by domain engineers. In this paper, we propose a semantic text categorization approach able to automatically create document collections in which documents are classified according to WordNet Domains taxonomy. Experiments have been performed by training a classifier with an automatic document collection and comparing results with those obtained by training the same classifier on a hand-made document collection. Experimental results point out that, on average, the performances of the automatic approach are quite similar to those obtained on a document collection classified by domain engineers.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.