A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature
Abstract Background Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to r...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2022-09-01
|
Series: | BioData Mining |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13040-022-00310-0 |
_version_ | 1797998379896143872 |
---|---|
author | Pratik Devkota Somya D. Mohanty Prashanti Manda |
author_facet | Pratik Devkota Somya D. Mohanty Prashanti Manda |
author_sort | Pratik Devkota |
collection | DOAJ |
description | Abstract Background Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. Results Here, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI’s BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. Conclusion The results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy. |
first_indexed | 2024-04-11T10:47:49Z |
format | Article |
id | doaj.art-98706317c94d4801bff23747c41b4db2 |
institution | Directory Open Access Journal |
issn | 1756-0381 |
language | English |
last_indexed | 2024-04-11T10:47:49Z |
publishDate | 2022-09-01 |
publisher | BMC |
record_format | Article |
series | BioData Mining |
spelling | doaj.art-98706317c94d4801bff23747c41b4db22022-12-22T04:29:00ZengBMCBioData Mining1756-03812022-09-0115112310.1186/s13040-022-00310-0A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literaturePratik Devkota0Somya D. Mohanty1Prashanti Manda2Department of Computer Science, University of North Carolina at GreensboroDepartment of Computer Science, University of North Carolina at GreensboroInformatics and Analytics, University of North Carolina at GreensboroAbstract Background Annotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. Results Here, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI’s BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. Conclusion The results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy.https://doi.org/10.1186/s13040-022-00310-0Deep learningGene ontologyAutomated annotationScientific literature |
spellingShingle | Pratik Devkota Somya D. Mohanty Prashanti Manda A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature BioData Mining Deep learning Gene ontology Automated annotation Scientific literature |
title | A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature |
title_full | A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature |
title_fullStr | A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature |
title_full_unstemmed | A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature |
title_short | A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature |
title_sort | gated recurrent unit based architecture for recognizing ontology concepts from biological literature |
topic | Deep learning Gene ontology Automated annotation Scientific literature |
url | https://doi.org/10.1186/s13040-022-00310-0 |
work_keys_str_mv | AT pratikdevkota agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature AT somyadmohanty agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature AT prashantimanda agatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature AT pratikdevkota gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature AT somyadmohanty gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature AT prashantimanda gatedrecurrentunitbasedarchitectureforrecognizingontologyconceptsfrombiologicalliterature |