Text categorization is usually performed by supervised algorithms on the large amount of hand-labelled documents which are labor-intensive and often not available. To avoid this drawback, this paper proposes a text categorization approach that is designed to fully exploiting semantic resources. It employs the ontological knowledge not only as lexical support for disambiguating terms and deriving their sense inventory, but also to classify documents in topic categories. Specifically, our work relates to apply two corpus-based thesauri (i.e. WordNet and WordNet Domains) for selecting the correct sense of words in a document while utilizing domain names for classification purposes. Experiments presented show how our approach performs well in classifying a large corpus of documents. A key part of the paper is the discussion of important aspects related to the use of surrounding words and different methods for word sense disambiguation

A Fully Semantic Approach to Large Scale Text Categorization.

DESSI, NICOLETTA;DESSI', STEFANIA;PES, BARBARA
2013-01-01

Abstract

Text categorization is usually performed by supervised algorithms on the large amount of hand-labelled documents which are labor-intensive and often not available. To avoid this drawback, this paper proposes a text categorization approach that is designed to fully exploiting semantic resources. It employs the ontological knowledge not only as lexical support for disambiguating terms and deriving their sense inventory, but also to classify documents in topic categories. Specifically, our work relates to apply two corpus-based thesauri (i.e. WordNet and WordNet Domains) for selecting the correct sense of words in a document while utilizing domain names for classification purposes. Experiments presented show how our approach performs well in classifying a large corpus of documents. A key part of the paper is the discussion of important aspects related to the use of surrounding words and different methods for word sense disambiguation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/99972
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact