Despite the significant contribution from specialized ontologies and text mining methods, the evaluation of the semantic similarity of genes remains difficult because of the complex functions in which genes are involved. A less exploited resource is Wikipedia that stores more than 10400 articles about human genes: each gene name identifies the corresponding Wikipedia page resuming gene's properties in short sentences where hyperlinks define relationships with other genes in Wikipedia. This paper evaluates the extent to which the Wikipedia can be trusted for assessing the similarity of a gene pair as the distance between their Wikipedia pages. We present a set of experiments that make use of TagMe (a powerful tool for evaluating the distance of two Wikipedia pages based on their annotations) to calculate the semantic similarity of several sets of genes on Wikipedia. Results compare well with gold standards and semantic similarity values evaluated on gene ontologies. The paper demonstrates the effectiveness of Wikipedia in recognizing functional groups of genes, the quality and the wealth of its knowledge about genes as well the accuracy of TagMe.
Is Wikipedia a latent gene ontology?
Dessì, Nicoletta;Atzori, Maurizio
2017-01-01
Abstract
Despite the significant contribution from specialized ontologies and text mining methods, the evaluation of the semantic similarity of genes remains difficult because of the complex functions in which genes are involved. A less exploited resource is Wikipedia that stores more than 10400 articles about human genes: each gene name identifies the corresponding Wikipedia page resuming gene's properties in short sentences where hyperlinks define relationships with other genes in Wikipedia. This paper evaluates the extent to which the Wikipedia can be trusted for assessing the similarity of a gene pair as the distance between their Wikipedia pages. We present a set of experiments that make use of TagMe (a powerful tool for evaluating the distance of two Wikipedia pages based on their annotations) to calculate the semantic similarity of several sets of genes on Wikipedia. Results compare well with gold standards and semantic similarity values evaluated on gene ontologies. The paper demonstrates the effectiveness of Wikipedia in recognizing functional groups of genes, the quality and the wealth of its knowledge about genes as well the accuracy of TagMe.File | Dimensione | Formato | |
---|---|---|---|
wetice17 - is wikipedia a latent gen ontology.pdf
Solo gestori archivio
Tipologia:
versione editoriale (VoR)
Dimensione
460.84 kB
Formato
Adobe PDF
|
460.84 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.