The semantic similarity measures are designed to compare terms that belong to the same ontology. Many of these are based on a graph structure, such as the well-known lexical database for the English language, named WordNet, which groups the words into sets of synonyms called synsets. Each synset represents a unique vertex of the WordNet semantic graph, through which is possible to get information about the relations between the different synsets. The literature shows several ways to determine the similarity between words or sentences through WordNet (e.g., by measuring the distance among the words, by counting the number of edges between the correspondent synsets), but almost all of them do not take into account the peculiar aspects of the used dataset. In some contexts this strategy could lead toward bad results, because it considers only the relationship between vertexes of the WordNet semantic graph, without giving them a different weight based on the synsets frequency within the considered datasets. In other words, common synsets and rare synsets are valued equally. This could create problems in some applications, such as those of recommender systems, where WordNet is exploited to evaluate the semantic similarity between the textual descriptions of the items positively evaluated by the users, and the descriptions of the other ones not evaluated yet. In this context, we need to identify the user preferences as best as possible, and not taking into account the synsets frequency, we risk to not recommend certain items to the users, since the semantic similarity generated by the most common synsets present in the description of other items could prevail. This work faces this problem, by introducing a novel criterion of evaluation of the similarity between words (and sentences) that exploits the WordNet semantic graph, adding to it the weight information of the synsets. The effectiveness of the proposed strategy is verified in the recommender systems context, where the recommendations are generated on the basis of the semantic similarity between the items stored in the user profiles, and the items not evaluated yet.

Introducing a weighted ontology to improve the graph-based semantic similarity measures

Saia, Roberto;BORATTO, LUDOVICO;CARTA, SALVATORE MARIO
2016-01-01

Abstract

The semantic similarity measures are designed to compare terms that belong to the same ontology. Many of these are based on a graph structure, such as the well-known lexical database for the English language, named WordNet, which groups the words into sets of synonyms called synsets. Each synset represents a unique vertex of the WordNet semantic graph, through which is possible to get information about the relations between the different synsets. The literature shows several ways to determine the similarity between words or sentences through WordNet (e.g., by measuring the distance among the words, by counting the number of edges between the correspondent synsets), but almost all of them do not take into account the peculiar aspects of the used dataset. In some contexts this strategy could lead toward bad results, because it considers only the relationship between vertexes of the WordNet semantic graph, without giving them a different weight based on the synsets frequency within the considered datasets. In other words, common synsets and rare synsets are valued equally. This could create problems in some applications, such as those of recommender systems, where WordNet is exploited to evaluate the semantic similarity between the textual descriptions of the items positively evaluated by the users, and the descriptions of the other ones not evaluated yet. In this context, we need to identify the user preferences as best as possible, and not taking into account the synsets frequency, we risk to not recommend certain items to the users, since the semantic similarity generated by the most common synsets present in the description of other items could prevail. This work faces this problem, by introducing a novel criterion of evaluation of the similarity between words (and sentences) that exploits the WordNet semantic graph, adding to it the weight information of the synsets. The effectiveness of the proposed strategy is verified in the recommender systems context, where the recommendations are generated on the basis of the semantic similarity between the items stored in the user profiles, and the items not evaluated yet.
2016
Semantic graph; Semantic analysis; Ontology; Graph theory; Metrics
File in questo prodotto:
File Dimensione Formato  
paper.pdf

Solo gestori archivio

Descrizione: Articolo principale
Tipologia: versione editoriale (VoR)
Dimensione 1.58 MB
Formato Adobe PDF
1.58 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/219261
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact