Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature

Abstract Background The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these process...

Full description

Bibliographic Details
Main Authors: H.-M. Müller, K. M. Van Auken, Y. Li, P. W. Sternberg
Format: Article
Language:English
Published: BMC 2018-03-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2103-8
_version_ 1811270260438859776
author H.-M. Müller
K. M. Van Auken
Y. Li
P. W. Sternberg
author_facet H.-M. Müller
K. M. Van Auken
Y. Li
P. W. Sternberg
author_sort H.-M. Müller
collection DOAJ
description Abstract Background The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. Results We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Conclusion Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements, and then send those annotations to any database in the world. Textpresso Central URL: http://www.textpresso.org/tpc
first_indexed 2024-04-12T21:57:39Z
format Article
id doaj.art-b5a1326ab86a430ea489914fe7eca701
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T21:57:39Z
publishDate 2018-03-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-b5a1326ab86a430ea489914fe7eca7012022-12-22T03:15:15ZengBMCBMC Bioinformatics1471-21052018-03-0119111610.1186/s12859-018-2103-8Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literatureH.-M. Müller0K. M. Van Auken1Y. Li2P. W. Sternberg3Division of Biology and Biological Engineering, California Institute of TechnologyDivision of Biology and Biological Engineering, California Institute of TechnologyDivision of Biology and Biological Engineering, California Institute of TechnologyDivision of Biology and Biological Engineering, California Institute of TechnologyAbstract Background The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. Results We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Conclusion Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements, and then send those annotations to any database in the world. Textpresso Central URL: http://www.textpresso.org/tpchttp://link.springer.com/article/10.1186/s12859-018-2103-8Literature curationText miningInformation retrievalInformation extractionLiterature search engineOntology
spellingShingle H.-M. Müller
K. M. Van Auken
Y. Li
P. W. Sternberg
Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature
BMC Bioinformatics
Literature curation
Text mining
Information retrieval
Information extraction
Literature search engine
Ontology
title Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature
title_full Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature
title_fullStr Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature
title_full_unstemmed Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature
title_short Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature
title_sort textpresso central a customizable platform for searching text mining viewing and curating biomedical literature
topic Literature curation
Text mining
Information retrieval
Information extraction
Literature search engine
Ontology
url http://link.springer.com/article/10.1186/s12859-018-2103-8
work_keys_str_mv AT hmmuller textpressocentralacustomizableplatformforsearchingtextminingviewingandcuratingbiomedicalliterature
AT kmvanauken textpressocentralacustomizableplatformforsearchingtextminingviewingandcuratingbiomedicalliterature
AT yli textpressocentralacustomizableplatformforsearchingtextminingviewingandcuratingbiomedicalliterature
AT pwsternberg textpressocentralacustomizableplatformforsearchingtextminingviewingandcuratingbiomedicalliterature