Lexicons have risen as alternative resources to common supervised methods for classification or regression in different domains (e.g., Sentiment Analysis). These resources (especially lexical) lack of important domain context and it is not possible to tune/edit/improve them depending on new domains and data. With the exponential production of data and annotations witnessed today in several domains, leveraging lexical resources to improve existing lexicons becomes a must. In this work, a novel framework to build lexicons independently from the target domain and from input categories where each text needs to be classified is provided. It employs state-of-the-art Natural Language Processing, Word Sense Disambiguation tools, and techniques to make the method as general as possible. The framework takes as input a heterogeneous collection of annotated text towards a fixed number of categories. Its output is a list of WordNet word senses with weights for each category. We prove the effectiveness of the framework taking as case study the Emotion Detection task by employing the generated lexicons within such a domain. The results prove the effectiveness of proposed framework. Additionally, the paper shows an use case on the human-robot interaction within the Emotion Detection task. Furthermore we applied our methodology in several other domains and compared our approach against common supervised methods (regressors) showing the effectiveness of the generated lexicons. By freely providing the framework we aim at encouraging and disseminating the production of context-aware and domain-specific lexicons in other domains as well.

LexTex: a framework to generate lexicons using WordNet word senses in domain specific categories

Danilo Dessi
;
Diego Reforgiato Recupero
2022-01-01

Abstract

Lexicons have risen as alternative resources to common supervised methods for classification or regression in different domains (e.g., Sentiment Analysis). These resources (especially lexical) lack of important domain context and it is not possible to tune/edit/improve them depending on new domains and data. With the exponential production of data and annotations witnessed today in several domains, leveraging lexical resources to improve existing lexicons becomes a must. In this work, a novel framework to build lexicons independently from the target domain and from input categories where each text needs to be classified is provided. It employs state-of-the-art Natural Language Processing, Word Sense Disambiguation tools, and techniques to make the method as general as possible. The framework takes as input a heterogeneous collection of annotated text towards a fixed number of categories. Its output is a list of WordNet word senses with weights for each category. We prove the effectiveness of the framework taking as case study the Emotion Detection task by employing the generated lexicons within such a domain. The results prove the effectiveness of proposed framework. Additionally, the paper shows an use case on the human-robot interaction within the Emotion Detection task. Furthermore we applied our methodology in several other domains and compared our approach against common supervised methods (regressors) showing the effectiveness of the generated lexicons. By freely providing the framework we aim at encouraging and disseminating the production of context-aware and domain-specific lexicons in other domains as well.
2022
Emotion detection; Lexicon generation; Word sense disambiguation
File in questo prodotto:
File Dimensione Formato  
2021 - LexTex a Framework to generate Lexicons using WordNet Word Senses in Domain Specific Categories_DESSI_JIIS_journal.pdf

Solo gestori archivio

Tipologia: versione pre-print
Dimensione 498.64 kB
Formato Adobe PDF
498.64 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/321807
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact