eGIFT: Mining Gene Information from the Literature

<p>Abstract</p> <p>Background</p> <p>With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irreleva...

Full description

Bibliographic Details
Main Authors: Tudor Catalina O, Schmidt Carl J, Vijay-Shanker K
Format: Article
Language:English
Published: BMC 2010-08-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/11/418
Description
Summary:<p>Abstract</p> <p>Background</p> <p>With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult for life scientists and gene curators to rapidly get an overall picture about a specific gene from documents that mention its names and synonyms.</p> <p>Results</p> <p>In this paper, we present eGIFT (<url>http://biotm.cis.udel.edu/eGIFT</url>), a web-based tool that associates informative terms, called <it>i</it>Terms, and sentences containing them, with genes. To associate <it>i</it>Terms with a gene, eGIFT ranks <it>i</it>Terms about the gene, based on a score which compares the frequency of occurrence of a term in the gene's literature to its frequency of occurrence in documents about genes in general. To retrieve a gene's documents (Medline abstracts), eGIFT considers all gene names, aliases, and synonyms. Since many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene. Another additional filtering process is applied to retain those abstracts that focus on the gene rather than mention it in passing. eGIFT's information for a gene is pre-computed and users of eGIFT can search for genes by using a name or an EntrezGene identifier. <it>i</it>Terms are grouped into different categories to facilitate a quick inspection. eGIFT also links an <it>i</it>Term to sentences mentioning the term to allow users to see the relation between the <it>i</it>Term and the gene. We evaluated the precision and recall of eGIFT's <it>i</it>Terms for 40 genes; between 88% and 94% of the <it>i</it>Terms were marked as salient by our evaluators, and 94% of the UniProtKB keywords for these genes were also identified by eGIFT as <it>i</it>Terms.</p> <p>Conclusions</p> <p>Our evaluations suggest that <it>i</it>Terms capture highly-relevant aspects of genes. Furthermore, by showing sentences containing these terms, eGIFT can provide a quick description of a specific gene. eGIFT helps not only life scientists survey results of high-throughput experiments, but also annotators to find articles describing gene aspects and functions.</p>
ISSN:1471-2105