In this paper we address the problem of providing an order of relevance, or ranking, among entities' properties used in RDF datasets, Linked Data and SPARQL endpoints. We first motivate the importance of ranking RDF properties by providing two killer applications for the problem, namely property tagging and entity visualization. Moved by the desiderata of these applications, we propose to apply Machine Learning to Rank (MLR) techniques to the problem of ranking RDF properties. Our devised solution is based on a deep empirical study of all the dimensions involved: feature selection, MLR algorithm and Model training. The major advantages of our approach are the following: (a) flexibility/personalization, as the properties' relevance can be user-specified by personalizing the training set in a supervised approach, or set by a novel automatic classification approach based on SWiPE; (b) speed, since it can be applied without computing frequencies over the whole dataset, leveraging existing fast MLR algorithms; (c) effectiveness, as it can be applied even when no ontology data is available by using novel dataset-independent features; (d) precision, which is high both in terms of f-measure and Spearman's rho. Experimental results show that the proposed MLR framework outperform the two existing approaches found in literature which are related to RDF property ranking.

A machine-learning approach to ranking RDF properties

DESSI, ANDREA;ATZORI, MAURIZIO
2016

Abstract

In this paper we address the problem of providing an order of relevance, or ranking, among entities' properties used in RDF datasets, Linked Data and SPARQL endpoints. We first motivate the importance of ranking RDF properties by providing two killer applications for the problem, namely property tagging and entity visualization. Moved by the desiderata of these applications, we propose to apply Machine Learning to Rank (MLR) techniques to the problem of ranking RDF properties. Our devised solution is based on a deep empirical study of all the dimensions involved: feature selection, MLR algorithm and Model training. The major advantages of our approach are the following: (a) flexibility/personalization, as the properties' relevance can be user-specified by personalizing the training set in a supervised approach, or set by a novel automatic classification approach based on SWiPE; (b) speed, since it can be applied without computing frequencies over the whole dataset, leveraging existing fast MLR algorithms; (c) effectiveness, as it can be applied even when no ontology data is available by using novel dataset-independent features; (d) precision, which is high both in terms of f-measure and Spearman's rho. Experimental results show that the proposed MLR framework outperform the two existing approaches found in literature which are related to RDF property ranking.
Fast property ranking; Machine learning; Semantic web; User experience; Hardware and Architecture; Software; Computer Networks and Communications
File in questo prodotto:
File Dimensione Formato  
FGCS15_RankProperties.pdf

Solo gestori archivio

Tipologia: versione post-print
Dimensione 753.29 kB
Formato Adobe PDF
753.29 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/136457
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 34
  • ???jsp.display-item.citation.isi??? 28
social impact