UNICA IRIS Institutional Research Information System

The semantic similarity measures are designed to compare terms that belong to the same ontology. Many of these are based on a graph structure, such as the well-known lexical database for the English language, named WordNet, which groups the words into sets of synonyms called synsets. Each synset represents a unique vertex of the WordNet semantic graph, through which is possible to get information about the relations between the different synsets. The literature shows several ways to determine the similarity between words or sentences through WordNet (e.g., by measuring the distance among the words, by counting the number of edges between the correspondent synsets), but almost all of them do not take into account the peculiar aspects of the used dataset. In some contexts this strategy could lead toward bad results, because it considers only the relationship between vertexes of the WordNet semantic graph, without giving them a different weight based on the synsets frequency within the considered datasets. In other words, common synsets and rare synsets are valued equally. This could create problems in some applications, such as those of recommender systems, where WordNet is exploited to evaluate the semantic similarity between the textual descriptions of the items positively evaluated by the users, and the descriptions of the other ones not evaluated yet. In this context, we need to identify the user preferences as best as possible, and not taking into account the synsets frequency, we risk to not recommend certain items to the users, since the semantic similarity generated by the most common synsets present in the description of other items could prevail. This work faces this problem, by introducing a novel criterion of evaluation of the similarity between words (and sentences) that exploits the WordNet semantic graph, adding to it the weight information of the synsets. The effectiveness of the proposed strategy is verified in the recommender systems context, where the recommendations are generated on the basis of the semantic similarity between the items stored in the user profiles, and the items not evaluated yet.

Introducing a weighted ontology to improve the graph-based semantic similarity measures

Saia, Roberto;BORATTO, LUDOVICO;CARTA, SALVATORE MARIO

2016-01-01

Abstract

The semantic similarity measures are designed to compare terms that belong to the same ontology. Many of these are based on a graph structure, such as the well-known lexical database for the English language, named WordNet, which groups the words into sets of synonyms called synsets. Each synset represents a unique vertex of the WordNet semantic graph, through which is possible to get information about the relations between the different synsets. The literature shows several ways to determine the similarity between words or sentences through WordNet (e.g., by measuring the distance among the words, by counting the number of edges between the correspondent synsets), but almost all of them do not take into account the peculiar aspects of the used dataset. In some contexts this strategy could lead toward bad results, because it considers only the relationship between vertexes of the WordNet semantic graph, without giving them a different weight based on the synsets frequency within the considered datasets. In other words, common synsets and rare synsets are valued equally. This could create problems in some applications, such as those of recommender systems, where WordNet is exploited to evaluate the semantic similarity between the textual descriptions of the items positively evaluated by the users, and the descriptions of the other ones not evaluated yet. In this context, we need to identify the user preferences as best as possible, and not taking into account the synsets frequency, we risk to not recommend certain items to the users, since the semantic similarity generated by the most common synsets present in the description of other items could prevail. This work faces this problem, by introducing a novel criterion of evaluation of the similarity between words (and sentences) that exploits the WordNet semantic graph, adding to it the weight information of the synsets. The effectiveness of the proposed strategy is verified in the recommender systems context, where the recommendations are generated on the basis of the semantic similarity between the items stored in the user profiles, and the items not evaluated yet.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2016
			
	Parole chiave
	
				Semantic graph; Semantic analysis; Ontology; Graph theory; Metrics
			
	Tipologia:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
paper.pdf Solo gestori archivio Descrizione: Articolo principale Tipologia: versione editoriale (VoR) Dimensione 1.58 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.58 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I metadati presenti in IRIS UNICA sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono protetti da diritto d'autore, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/219261

Citazioni

ND

ND

ND

ND

social impact