In the current Digital Transformation scenario, Knowledge Graphs are essential for comprehending, representing, and exploiting complex information in a structured form. The main paradigm in automatically generating proper Knowledge Graphs relies on predefined schemas or ontologies. Such schemas are typically manually constructed, requiring an intensive human effort, and are often sensitive to information loss due to negligence, incomplete analysis, or human subjectivity or inclination. Limiting human bias and the resulting information loss in creating proper Knowledge Graphs is paramount, particularly for user modeling in various sectors, such as education or healthcare. To this end, we propose a novel approach to automatically generating a proper entity schema. The devised methodology combines the language understanding capabilities of LLM with classical machine learning methods such as clustering to properly build an entity schema from a set of documents. This solution eliminates the need for human intervention and fosters a more efficient and comprehensive knowledge representation. The assessment of our proposal concerns adopting a state-of-the-art entity extraction model (UniNER) to estimate the relevance of the extracted entities based on the generated schema. Results confirm the potential of our approach, as we observed a negligible difference between the topic similarity score obtained with the ground truth and with the automatically generated schema (less than 1% on average on three different datasets). Such an outcome confirms that the proposed approach may be valuable in automatically creating an entity schema from a set of documents.
Towards Zero-shot Knowledge Graph building: Automated Schema Inference
Carta, Salvatore;Giuliani, Alessandro
;Manca, Marco Manolo
;Piano, Leonardo
;Tiddia, Sandro Gabriele
2024-01-01
Abstract
In the current Digital Transformation scenario, Knowledge Graphs are essential for comprehending, representing, and exploiting complex information in a structured form. The main paradigm in automatically generating proper Knowledge Graphs relies on predefined schemas or ontologies. Such schemas are typically manually constructed, requiring an intensive human effort, and are often sensitive to information loss due to negligence, incomplete analysis, or human subjectivity or inclination. Limiting human bias and the resulting information loss in creating proper Knowledge Graphs is paramount, particularly for user modeling in various sectors, such as education or healthcare. To this end, we propose a novel approach to automatically generating a proper entity schema. The devised methodology combines the language understanding capabilities of LLM with classical machine learning methods such as clustering to properly build an entity schema from a set of documents. This solution eliminates the need for human intervention and fosters a more efficient and comprehensive knowledge representation. The assessment of our proposal concerns adopting a state-of-the-art entity extraction model (UniNER) to estimate the relevance of the extracted entities based on the generated schema. Results confirm the potential of our approach, as we observed a negligible difference between the topic similarity score obtained with the ground truth and with the automatically generated schema (less than 1% on average on three different datasets). Such an outcome confirms that the proposed approach may be valuable in automatically creating an entity schema from a set of documents.File | Dimensione | Formato | |
---|---|---|---|
3631700.3665234.pdf
accesso aperto
Tipologia:
versione editoriale (VoR)
Dimensione
724.16 kB
Formato
Adobe PDF
|
724.16 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.