Research on RNA secondary structure predicting via bidirectional recurrent neural network

Abstract Background RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequenc...

Full description

Bibliographic Details
Main Authors: Weizhong Lu, Yan Cao, Hongjie Wu, Yijie Ding, Zhengwei Song, Yu Zhang, Qiming Fu, Haiou Li
Format: Article
Language:English
Published: BMC 2021-09-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04332-z
_version_ 1818364137380511744
author Weizhong Lu
Yan Cao
Hongjie Wu
Yijie Ding
Zhengwei Song
Yu Zhang
Qiming Fu
Haiou Li
author_facet Weizhong Lu
Yan Cao
Hongjie Wu
Yijie Ding
Zhengwei Song
Yu Zhang
Qiming Fu
Haiou Li
author_sort Weizhong Lu
collection DOAJ
description Abstract Background RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance. Results The algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively. Conclusions The flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results.
first_indexed 2024-12-13T21:59:35Z
format Article
id doaj.art-4143362213044d99b78244c1bb873eda
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-13T21:59:35Z
publishDate 2021-09-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-4143362213044d99b78244c1bb873eda2022-12-21T23:30:03ZengBMCBMC Bioinformatics1471-21052021-09-0122S311810.1186/s12859-021-04332-zResearch on RNA secondary structure predicting via bidirectional recurrent neural networkWeizhong Lu0Yan Cao1Hongjie Wu2Yijie Ding3Zhengwei Song4Yu Zhang5Qiming Fu6Haiou Li7School of Electronic and Information Engineering, Suzhou University of Science and TechnologySchool of Electronic and Information Engineering, Suzhou University of Science and TechnologySchool of Electronic and Information Engineering, Suzhou University of Science and TechnologySchool of Electronic and Information Engineering, Suzhou University of Science and TechnologySchool of Electronic and Information Engineering, Suzhou University of Science and TechnologySuzhou Industrial Park Institute of Services OutsourcingSchool of Electronic and Information Engineering, Suzhou University of Science and TechnologySchool of Electronic and Information Engineering, Suzhou University of Science and TechnologyAbstract Background RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance. Results The algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively. Conclusions The flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results.https://doi.org/10.1186/s12859-021-04332-zRecurrent neural networkRNA secondary structure predictionPseudoknots
spellingShingle Weizhong Lu
Yan Cao
Hongjie Wu
Yijie Ding
Zhengwei Song
Yu Zhang
Qiming Fu
Haiou Li
Research on RNA secondary structure predicting via bidirectional recurrent neural network
BMC Bioinformatics
Recurrent neural network
RNA secondary structure prediction
Pseudoknots
title Research on RNA secondary structure predicting via bidirectional recurrent neural network
title_full Research on RNA secondary structure predicting via bidirectional recurrent neural network
title_fullStr Research on RNA secondary structure predicting via bidirectional recurrent neural network
title_full_unstemmed Research on RNA secondary structure predicting via bidirectional recurrent neural network
title_short Research on RNA secondary structure predicting via bidirectional recurrent neural network
title_sort research on rna secondary structure predicting via bidirectional recurrent neural network
topic Recurrent neural network
RNA secondary structure prediction
Pseudoknots
url https://doi.org/10.1186/s12859-021-04332-z
work_keys_str_mv AT weizhonglu researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT yancao researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT hongjiewu researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT yijieding researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT zhengweisong researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT yuzhang researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT qimingfu researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork
AT haiouli researchonrnasecondarystructurepredictingviabidirectionalrecurrentneuralnetwork