UNICA IRIS Institutional Research Information System

Several techniques and workflows have emerged recently for automatically extracting knowledge graphs from documents like scientific articles and patents. However, adapting these approaches to integrate alternative text sources such as micro-blogging posts and news and to model open-domain entities and relationships commonly found in these sources is still challenging. This paper introduces an improved information extraction pipeline designed specifically for extracting a knowledge graph comprising open-domain entities from micro-blogging posts on social media platforms. Our pipeline utilizes dependency parsing and employs unsupervised classification of entity relations through hierarchical clustering over word embeddings. We present a case study involving the extraction of semantic triples from a tweet collection concerning digital transformation and show through two experimental evaluations on the same dataset that our system achieves precision rates exceeding 95% and surpasses similar pipelines by approximately 5% in terms of precision, while also generating a notably higher number of triples.

Knowledge Graphs for Digital Transformation Monitoring in Social Media

Zavarella V.;Reforgiato Recupero D.;Consoli S.;Fenu G.;Angioni S.;Buscaldi D.;Dessi D.;Osborne F.

2024-01-01

Abstract

Several techniques and workflows have emerged recently for automatically extracting knowledge graphs from documents like scientific articles and patents. However, adapting these approaches to integrate alternative text sources such as micro-blogging posts and news and to model open-domain entities and relationships commonly found in these sources is still challenging. This paper introduces an improved information extraction pipeline designed specifically for extracting a knowledge graph comprising open-domain entities from micro-blogging posts on social media platforms. Our pipeline utilizes dependency parsing and employs unsupervised classification of entity relations through hierarchical clustering over word embeddings. We present a case study involving the extraction of semantic triples from a tweet collection concerning digital transformation and show through two experimental evaluations on the same dataset that our system achieves precision rates exceeding 95% and surpasses similar pipelines by approximately 5% in terms of precision, while also generating a notably higher number of triples.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Parole chiave
	
				Hierarchical Clustering; Information Extraction; Knowledge Graphs; Named Entity Recognition; Social Media Analysis; Word Embeddings
			
	Tipologia:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Knowledge Graphs for Digital Transformation Monitoring in Social Media - text2kg_paper8.pdf accesso aperto Tipologia: versione editoriale (VoR) Dimensione 838.94 kB Formato Adobe PDF Visualizza/Apri	838.94 kB	Adobe PDF	Visualizza/Apri

I metadati presenti in IRIS UNICA sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono protetti da diritto d'autore, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/426555

Citazioni

ND

0

ND

ND

social impact