Wikipedia's InfoBoxes play a crucial role in advanced applications and provide the main knowledge source for DBpedia and the powerful structured queries it supports. However, InfoBoxes, which were created by crowdsourcing for human rather than computer consumption, suffer from incompleteness, inconsistencies, and inaccuracies. To overcome these problems, we have developed (i) the IBminer system that extracts InfoBox information by text-mining Wikipedia pages, (ii) the IKBStore system that integrates the information derived by IBminer with that of DBpedia, YAGO2, WikiData, WordNet, and other sources, and (iii) SWiPE and InfoBox Editor (IBE) that provide a user-friendly interfaces for querying and revising the knowledge base. Thus, IBminer uses a deep NLP-based approach to extract from text a semantic representation structure called TextGraph from which the system detects patterns and derives subject-attribute-value relations, as well as domain-specific synonyms for the knowledge base. IKBStore and IBE complement the powerful, user-friendly, by-example structured queries of SWiPE by supporting the validation and provenance history for the information contained in the knowledge base, along with the ability of upgrading its knowledge when this is found incomplete, incorrect, or outdated.
|Titolo:||Text-mining, structured queries, and knowledge management on web document corpora|
|Data di pubblicazione:||2014|
|Tipologia:||1.1 Articolo in rivista|
File in questo prodotto:
|sigmod_recordj14 - Text-Mining, Structured Queries, and Knowledge Management on Web Document Corpora.pdf||Editoriale||Administrator Richiedi una copia|