Despite the significant contribution from specialized ontologies and text mining methods, the evaluation of the semantic similarity of genes remains difficult because of the complex functions in which genes are involved. A less exploited resource is Wikipedia that stores more than 10400 articles about human genes: each gene name identifies the corresponding Wikipedia page resuming gene's properties in short sentences where hyperlinks define relationships with other genes in Wikipedia. This paper evaluates the extent to which the Wikipedia can be trusted for assessing the similarity of a gene pair as the distance between their Wikipedia pages. We present a set of experiments that make use of TagMe (a powerful tool for evaluating the distance of two Wikipedia pages based on their annotations) to calculate the semantic similarity of several sets of genes on Wikipedia. Results compare well with gold standards and semantic similarity values evaluated on gene ontologies. The paper demonstrates the effectiveness of Wikipedia in recognizing functional groups of genes, the quality and the wealth of its knowledge about genes as well the accuracy of TagMe.

Is Wikipedia a latent gene ontology?

Dessì, Nicoletta;Atzori, Maurizio
2017

Abstract

Despite the significant contribution from specialized ontologies and text mining methods, the evaluation of the semantic similarity of genes remains difficult because of the complex functions in which genes are involved. A less exploited resource is Wikipedia that stores more than 10400 articles about human genes: each gene name identifies the corresponding Wikipedia page resuming gene's properties in short sentences where hyperlinks define relationships with other genes in Wikipedia. This paper evaluates the extent to which the Wikipedia can be trusted for assessing the similarity of a gene pair as the distance between their Wikipedia pages. We present a set of experiments that make use of TagMe (a powerful tool for evaluating the distance of two Wikipedia pages based on their annotations) to calculate the semantic similarity of several sets of genes on Wikipedia. Results compare well with gold standards and semantic similarity values evaluated on gene ontologies. The paper demonstrates the effectiveness of Wikipedia in recognizing functional groups of genes, the quality and the wealth of its knowledge about genes as well the accuracy of TagMe.
978-153861758-8
Gene relatedness; Semantic similarity; TagMe; Text mining; Wikipedia; Computer networks and communications; Business, management and accounting (miscellaneous); Hardware and architecture
File in questo prodotto:
File Dimensione Formato  
wetice17 - is wikipedia a latent gen ontology.pdf

Solo gestori archivio

Tipologia: versione editoriale
Dimensione 460.84 kB
Formato Adobe PDF
460.84 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11584/238961
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact