In the current Digital Transformation scenario, Knowledge Graphs are essential for comprehending, representing, and exploiting complex information in a structured form. The main paradigm in automatically generating proper Knowledge Graphs relies on predefined schemas or ontologies. Such schemas are typically manually constructed, requiring an intensive human effort, and are often sensitive to information loss due to negligence, incomplete analysis, or human subjectivity or inclination. Limiting human bias and the resulting information loss in creating proper Knowledge Graphs is paramount, particularly for user modeling in various sectors, such as education or healthcare. To this end, we propose a novel approach to automatically generating a proper entity schema. The devised methodology combines the language understanding capabilities of LLM with classical machine learning methods such as clustering to properly build an entity schema from a set of documents. This solution eliminates the need for human intervention and fosters a more efficient and comprehensive knowledge representation. The assessment of our proposal concerns adopting a state-of-the-art entity extraction model (UniNER) to estimate the relevance of the extracted entities based on the generated schema. Results confirm the potential of our approach, as we observed a negligible difference between the topic similarity score obtained with the ground truth and with the automatically generated schema (less than 1% on average on three different datasets). Such an outcome confirms that the proposed approach may be valuable in automatically creating an entity schema from a set of documents.

Towards Zero-shot Knowledge Graph building: Automated Schema Inference

Carta, Salvatore;Giuliani, Alessandro
;
Manca, Marco Manolo
;
Piano, Leonardo
;
Tiddia, Sandro Gabriele
2024-01-01

Abstract

In the current Digital Transformation scenario, Knowledge Graphs are essential for comprehending, representing, and exploiting complex information in a structured form. The main paradigm in automatically generating proper Knowledge Graphs relies on predefined schemas or ontologies. Such schemas are typically manually constructed, requiring an intensive human effort, and are often sensitive to information loss due to negligence, incomplete analysis, or human subjectivity or inclination. Limiting human bias and the resulting information loss in creating proper Knowledge Graphs is paramount, particularly for user modeling in various sectors, such as education or healthcare. To this end, we propose a novel approach to automatically generating a proper entity schema. The devised methodology combines the language understanding capabilities of LLM with classical machine learning methods such as clustering to properly build an entity schema from a set of documents. This solution eliminates the need for human intervention and fosters a more efficient and comprehensive knowledge representation. The assessment of our proposal concerns adopting a state-of-the-art entity extraction model (UniNER) to estimate the relevance of the extracted entities based on the generated schema. Results confirm the potential of our approach, as we observed a negligible difference between the topic similarity score obtained with the ground truth and with the automatically generated schema (less than 1% on average on three different datasets). Such an outcome confirms that the proposed approach may be valuable in automatically creating an entity schema from a set of documents.
2024
Large Language Models; Named Entity Recognition; Ontology Learning
File in questo prodotto:
File Dimensione Formato  
3631700.3665234.pdf

accesso aperto

Tipologia: versione editoriale (VoR)
Dimensione 724.16 kB
Formato Adobe PDF
724.16 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/433785
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact