Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature
Abstract Background The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these process...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2018-03-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-018-2103-8 |
_version_ | 1811270260438859776 |
---|---|
author | H.-M. Müller K. M. Van Auken Y. Li P. W. Sternberg |
author_facet | H.-M. Müller K. M. Van Auken Y. Li P. W. Sternberg |
author_sort | H.-M. Müller |
collection | DOAJ |
description | Abstract Background The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. Results We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Conclusion Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements, and then send those annotations to any database in the world. Textpresso Central URL: http://www.textpresso.org/tpc |
first_indexed | 2024-04-12T21:57:39Z |
format | Article |
id | doaj.art-b5a1326ab86a430ea489914fe7eca701 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-12T21:57:39Z |
publishDate | 2018-03-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-b5a1326ab86a430ea489914fe7eca7012022-12-22T03:15:15ZengBMCBMC Bioinformatics1471-21052018-03-0119111610.1186/s12859-018-2103-8Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literatureH.-M. Müller0K. M. Van Auken1Y. Li2P. W. Sternberg3Division of Biology and Biological Engineering, California Institute of TechnologyDivision of Biology and Biological Engineering, California Institute of TechnologyDivision of Biology and Biological Engineering, California Institute of TechnologyDivision of Biology and Biological Engineering, California Institute of TechnologyAbstract Background The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. Results We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Conclusion Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements, and then send those annotations to any database in the world. Textpresso Central URL: http://www.textpresso.org/tpchttp://link.springer.com/article/10.1186/s12859-018-2103-8Literature curationText miningInformation retrievalInformation extractionLiterature search engineOntology |
spellingShingle | H.-M. Müller K. M. Van Auken Y. Li P. W. Sternberg Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature BMC Bioinformatics Literature curation Text mining Information retrieval Information extraction Literature search engine Ontology |
title | Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature |
title_full | Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature |
title_fullStr | Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature |
title_full_unstemmed | Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature |
title_short | Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature |
title_sort | textpresso central a customizable platform for searching text mining viewing and curating biomedical literature |
topic | Literature curation Text mining Information retrieval Information extraction Literature search engine Ontology |
url | http://link.springer.com/article/10.1186/s12859-018-2103-8 |
work_keys_str_mv | AT hmmuller textpressocentralacustomizableplatformforsearchingtextminingviewingandcuratingbiomedicalliterature AT kmvanauken textpressocentralacustomizableplatformforsearchingtextminingviewingandcuratingbiomedicalliterature AT yli textpressocentralacustomizableplatformforsearchingtextminingviewingandcuratingbiomedicalliterature AT pwsternberg textpressocentralacustomizableplatformforsearchingtextminingviewingandcuratingbiomedicalliterature |