UNICA IRIS Institutional Research Information System

Recently, Linked Open Data (LOD) has experienced an exponential growth via publishing huge volume of datasets on the Web. This vast amount of information needs to be searched, queried, and interlinked easier than before. It is recommended that potential data publishers provide recapitulative information about their datasets published on the Web. This information, which functions as metadata, will facilitate those datasets to be discovered easily. As it is not always the case, we are faced with a large number of datasets without a proper profile, leading to a high demand for different data profiling techniques. In this paper, we focus on RDF dataset profiling utilizing unsupervised machine learning techniques, namely knowledge based topic modeling. We also investigate the use of Wikipedia categories to represent the topics identified in an RDF dataset. In the proposed model, we extract a number of representative topics for an RDF dataset and annotate them with Wikipedia categories. The union of the assigned categories serves as a profile of the dataset, in a sense that it provides an overall characterization of the content of the dataset.

R-LDA: Profiling RDF Datasets Using Knowledge-Based Topic Modeling

Pouriyeh S.;Allahyaril M.;Cheng G.;Arabnia H. R.;Kochut K.;Atzori M.

2019-01-01

Abstract

Recently, Linked Open Data (LOD) has experienced an exponential growth via publishing huge volume of datasets on the Web. This vast amount of information needs to be searched, queried, and interlinked easier than before. It is recommended that potential data publishers provide recapitulative information about their datasets published on the Web. This information, which functions as metadata, will facilitate those datasets to be discovered easily. As it is not always the case, we are faced with a large number of datasets without a proper profile, leading to a high demand for different data profiling techniques. In this paper, we focus on RDF dataset profiling utilizing unsupervised machine learning techniques, namely knowledge based topic modeling. We also investigate the use of Wikipedia categories to represent the topics identified in an RDF dataset. In the proposed model, we extract a number of representative topics for an RDF dataset and annotate them with Wikipedia categories. The union of the assigned categories serves as a profile of the dataset, in a sense that it provides an overall characterization of the content of the dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Codice ISBN
	
				978-1-5386-6783-5
			
	Parole chiave
	
				Ontology; RDF profiling; Topic modeling
			
	Tipologia:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
icsc19 - R-LDA Profiling RDF datasets using knowledge-based topic modeling (TEMP).pdf Solo gestori archivio Tipologia: versione post-print (AAM) Dimensione 391.01 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	391.01 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I metadati presenti in IRIS UNICA sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono protetti da diritto d'autore, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/275310

Citazioni

ND

5

4

ND

social impact