Wikipedia's InfoBoxes play a crucial role in advanced applications and provide the main knowledge source for DBpedia and the powerful structured queries it supports. However, InfoBoxes, which were created by crowdsourcing for human rather than computer consumption, suffer from incompleteness, inconsistencies, and inaccuracies. To overcome these problems, we have developed (i) the IBminer system that extracts InfoBox information by text-mining Wikipedia pages, (ii) the IKBStore system that integrates the information derived by IBminer with that of DBpedia, YAGO2, WikiData, WordNet, and other sources, and (iii) SWiPE and InfoBox Editor (IBE) that provide a user-friendly interfaces for querying and revising the knowledge base. Thus, IBminer uses a deep NLP-based approach to extract from text a semantic representation structure called TextGraph from which the system detects patterns and derives subject-attribute-value relations, as well as domain-specific synonyms for the knowledge base. IKBStore and IBE complement the powerful, user-friendly, by-example structured queries of SWiPE by supporting the validation and provenance history for the information contained in the knowledge base, along with the ability of upgrading its knowledge when this is found incomplete, incorrect, or outdated.

Text-mining, structured queries, and knowledge management on web document corpora

ATZORI, MAURIZIO;
2014-01-01

Abstract

Wikipedia's InfoBoxes play a crucial role in advanced applications and provide the main knowledge source for DBpedia and the powerful structured queries it supports. However, InfoBoxes, which were created by crowdsourcing for human rather than computer consumption, suffer from incompleteness, inconsistencies, and inaccuracies. To overcome these problems, we have developed (i) the IBminer system that extracts InfoBox information by text-mining Wikipedia pages, (ii) the IKBStore system that integrates the information derived by IBminer with that of DBpedia, YAGO2, WikiData, WordNet, and other sources, and (iii) SWiPE and InfoBox Editor (IBE) that provide a user-friendly interfaces for querying and revising the knowledge base. Thus, IBminer uses a deep NLP-based approach to extract from text a semantic representation structure called TextGraph from which the system detects patterns and derives subject-attribute-value relations, as well as domain-specific synonyms for the knowledge base. IKBStore and IBE complement the powerful, user-friendly, by-example structured queries of SWiPE by supporting the validation and provenance history for the information contained in the knowledge base, along with the ability of upgrading its knowledge when this is found incomplete, incorrect, or outdated.
File in questo prodotto:
File Dimensione Formato  
sigmod_recordj14 - Text-Mining, Structured Queries, and Knowledge Management on Web Document Corpora.pdf

Solo gestori archivio

Tipologia: versione editoriale
Dimensione 533.3 kB
Formato Adobe PDF
533.3 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/57990
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 4
social impact