In this paper we address the problem of providing an order of relevance, or ranking, among entities' properties used in RDF datasets, Linked Data and SPARQL endpoints. We first motivate the importance of ranking RDF properties by providing two killer applications for the problem, namely property tagging and entity visualization. Moved by the desiderata of these applications, we propose to apply Machine Learning to Rank (MLR) techniques to the problem of ranking RDF properties. Our devised solution is based on a deep empirical study of all the dimensions involved: feature selection, MLR algorithm and Model training. The major advantages of our approach are the following: (a) flexibility/personalization, as the properties' relevance can be user-specified by personalizing the training set in a supervised approach, or set by a novel automatic classification approach based on SWiPE; (b) speed, since it can be applied without computing frequencies over the whole dataset, leveraging existing fast MLR algorithms; (c) effectiveness, as it can be applied even when no ontology data is available by using novel dataset-independent features; (d) precision, which is high both in terms of f-measure and Spearman's rho. Experimental results show that the proposed MLR framework outperform the two existing approaches found in literature which are related to RDF property ranking.
A machine-learning approach to ranking RDF properties
DESSI, ANDREA;ATZORI, MAURIZIO
2016-01-01
Abstract
In this paper we address the problem of providing an order of relevance, or ranking, among entities' properties used in RDF datasets, Linked Data and SPARQL endpoints. We first motivate the importance of ranking RDF properties by providing two killer applications for the problem, namely property tagging and entity visualization. Moved by the desiderata of these applications, we propose to apply Machine Learning to Rank (MLR) techniques to the problem of ranking RDF properties. Our devised solution is based on a deep empirical study of all the dimensions involved: feature selection, MLR algorithm and Model training. The major advantages of our approach are the following: (a) flexibility/personalization, as the properties' relevance can be user-specified by personalizing the training set in a supervised approach, or set by a novel automatic classification approach based on SWiPE; (b) speed, since it can be applied without computing frequencies over the whole dataset, leveraging existing fast MLR algorithms; (c) effectiveness, as it can be applied even when no ontology data is available by using novel dataset-independent features; (d) precision, which is high both in terms of f-measure and Spearman's rho. Experimental results show that the proposed MLR framework outperform the two existing approaches found in literature which are related to RDF property ranking.File | Dimensione | Formato | |
---|---|---|---|
FGCS15_RankProperties.pdf
Solo gestori archivio
Tipologia:
versione post-print (AAM)
Dimensione
753.29 kB
Formato
Adobe PDF
|
753.29 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.