A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature

Abstract Background Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to r...

Full description

Bibliographic Details
Main Authors: Pratik Devkota, Somya D. Mohanty, Prashanti Manda
Format: Article
Language:English
Published: BMC 2022-09-01
Series:BioData Mining
Subjects:
Online Access:https://doi.org/10.1186/s13040-022-00310-0
_version_ 1797998379896143872
author Pratik Devkota
Somya D. Mohanty
Prashanti Manda
author_facet Pratik Devkota
Somya D. Mohanty
Prashanti Manda
author_sort Pratik Devkota
collection DOAJ
description Abstract Background Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. Results Here, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI’s BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. Conclusion The results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy.
first_indexed 2024-04-11T10:47:49Z
format Article
id doaj.art-98706317c94d4801bff23747c41b4db2
institution Directory Open Access Journal
issn 1756-0381
language English
last_indexed 2024-04-11T10:47:49Z
publishDate 2022-09-01
publisher BMC
record_format Article
series BioData Mining
spelling doaj.art-98706317c94d4801bff23747c41b4db22022-12-22T04:29:00ZengBMCBioData Mining1756-03812022-09-0115112310.1186/s13040-022-00310-0A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literaturePratik Devkota0Somya D. Mohanty1Prashanti Manda2Department of Computer Science, University of North Carolina at GreensboroDepartment of Computer Science, University of North Carolina at GreensboroInformatics and Analytics, University of North Carolina at GreensboroAbstract Background Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. Results Here, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI’s BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. Conclusion The results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy.https://doi.org/10.1186/s13040-022-00310-0Deep learningGene ontologyAutomated annotationScientific literature
spellingShingle Pratik Devkota
Somya D. Mohanty
Prashanti Manda
A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
BioData Mining
Deep learning
Gene ontology
Automated annotation
Scientific literature
title A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
title_full A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
title_fullStr A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
title_full_unstemmed A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
title_short A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
title_sort gated recurrent unit based architecture for recognizing ontology concepts from biological literature
topic Deep learning
Gene ontology
Automated annotation
Scientific literature
url https://doi.org/10.1186/s13040-022-00310-0
work_keys_str_mv AT pratikdevkota agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature
AT somyadmohanty agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature
AT prashantimanda agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature
AT pratikdevkota gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature
AT somyadmohanty gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature
AT prashantimanda gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature