IRESpy: an XGBoost model for prediction of internal ribosome entry sites

Abstract Background Internal ribosome entry sites (IRES) are segments of mRNA found in untranslated regions that can recruit the ribosome and initiate translation independently of the 5′ cap-dependent translation initiation mechanism. IRES usually function when 5′ cap-dependent translation initiatio...

Full description

Bibliographic Details
Main Authors: Junhui Wang, Michael Gribskov
Format: Article
Language:English
Published: BMC 2019-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2999-7
_version_ 1819076737714094080
author Junhui Wang
Michael Gribskov
author_facet Junhui Wang
Michael Gribskov
author_sort Junhui Wang
collection DOAJ
description Abstract Background Internal ribosome entry sites (IRES) are segments of mRNA found in untranslated regions that can recruit the ribosome and initiate translation independently of the 5′ cap-dependent translation initiation mechanism. IRES usually function when 5′ cap-dependent translation initiation has been blocked or repressed. They have been widely found to play important roles in viral infections and cellular processes. However, a limited number of confirmed IRES have been reported due to the requirement for highly labor intensive, slow, and low efficiency laboratory experiments. Bioinformatics tools have been developed, but there is no reliable online tool. Results This paper systematically examines the features that can distinguish IRES from non-IRES sequences. Sequence features such as kmer words, structural features such as QMFE, and sequence/structure hybrid features are evaluated as possible discriminators. They are incorporated into an IRES classifier based on XGBoost. The XGBoost model performs better than previous classifiers, with higher accuracy and much shorter computational time. The number of features in the model has been greatly reduced, compared to previous predictors, by including global kmer and structural features. The contributions of model features are well explained by LIME and SHapley Additive exPlanations. The trained XGBoost model has been implemented as a bioinformatics tool for IRES prediction, IRESpy (https://irespy.shinyapps.io/IRESpy/), which has been applied to scan the human 5′ UTR and find novel IRES segments. Conclusions IRESpy is a fast, reliable, high-throughput IRES online prediction tool. It provides a publicly available tool for all IRES researchers, and can be used in other genomics applications such as gene annotation and analysis of differential gene expression.
first_indexed 2024-12-21T18:46:04Z
format Article
id doaj.art-359ebe2fa2684347ae00cc95fbe5b20f
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-21T18:46:04Z
publishDate 2019-07-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-359ebe2fa2684347ae00cc95fbe5b20f2022-12-21T18:53:53ZengBMCBMC Bioinformatics1471-21052019-07-0120111510.1186/s12859-019-2999-7IRESpy: an XGBoost model for prediction of internal ribosome entry sitesJunhui Wang0Michael Gribskov1Biological Sciences Department, Purdue UniversityBiological Sciences Department, Purdue UniversityAbstract Background Internal ribosome entry sites (IRES) are segments of mRNA found in untranslated regions that can recruit the ribosome and initiate translation independently of the 5′ cap-dependent translation initiation mechanism. IRES usually function when 5′ cap-dependent translation initiation has been blocked or repressed. They have been widely found to play important roles in viral infections and cellular processes. However, a limited number of confirmed IRES have been reported due to the requirement for highly labor intensive, slow, and low efficiency laboratory experiments. Bioinformatics tools have been developed, but there is no reliable online tool. Results This paper systematically examines the features that can distinguish IRES from non-IRES sequences. Sequence features such as kmer words, structural features such as QMFE, and sequence/structure hybrid features are evaluated as possible discriminators. They are incorporated into an IRES classifier based on XGBoost. The XGBoost model performs better than previous classifiers, with higher accuracy and much shorter computational time. The number of features in the model has been greatly reduced, compared to previous predictors, by including global kmer and structural features. The contributions of model features are well explained by LIME and SHapley Additive exPlanations. The trained XGBoost model has been implemented as a bioinformatics tool for IRES prediction, IRESpy (https://irespy.shinyapps.io/IRESpy/), which has been applied to scan the human 5′ UTR and find novel IRES segments. Conclusions IRESpy is a fast, reliable, high-throughput IRES online prediction tool. It provides a publicly available tool for all IRES researchers, and can be used in other genomics applications such as gene annotation and analysis of differential gene expression.http://link.springer.com/article/10.1186/s12859-019-2999-7Internal ribosome entry site (IRES)BioinformaticsMachine learningXGBoost
spellingShingle Junhui Wang
Michael Gribskov
IRESpy: an XGBoost model for prediction of internal ribosome entry sites
BMC Bioinformatics
Internal ribosome entry site (IRES)
Bioinformatics
Machine learning
XGBoost
title IRESpy: an XGBoost model for prediction of internal ribosome entry sites
title_full IRESpy: an XGBoost model for prediction of internal ribosome entry sites
title_fullStr IRESpy: an XGBoost model for prediction of internal ribosome entry sites
title_full_unstemmed IRESpy: an XGBoost model for prediction of internal ribosome entry sites
title_short IRESpy: an XGBoost model for prediction of internal ribosome entry sites
title_sort irespy an xgboost model for prediction of internal ribosome entry sites
topic Internal ribosome entry site (IRES)
Bioinformatics
Machine learning
XGBoost
url http://link.springer.com/article/10.1186/s12859-019-2999-7
work_keys_str_mv AT junhuiwang irespyanxgboostmodelforpredictionofinternalribosomeentrysites
AT michaelgribskov irespyanxgboostmodelforpredictionofinternalribosomeentrysites