PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm

Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desir...

Full description

Bibliographic Details
Main Authors: Jujuan Zhuang, Danyang Liu, Meng Lin, Wenjing Qiu, Jinyang Liu, Size Chen
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-11-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2021.773882/full
_version_ 1818999924686061568
author Jujuan Zhuang
Danyang Liu
Meng Lin
Wenjing Qiu
Wenjing Qiu
Jinyang Liu
Size Chen
Size Chen
Size Chen
author_facet Jujuan Zhuang
Danyang Liu
Meng Lin
Wenjing Qiu
Wenjing Qiu
Jinyang Liu
Size Chen
Size Chen
Size Chen
author_sort Jujuan Zhuang
collection DOAJ
description Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming.Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of three species: H. sapiens, S. cerevisiae, and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern, and position-specific nucleotide composition. The three feature matrices are convoluted twice and fed into the capsule neural network and bidirectional gated recurrent unit network with a self-attention mechanism for classification.Conclusion: Compared with other state-of-the-art methods, our model gets the highest accuracy of the prediction on the independent testing data set S-200; the accuracy improves 12.38%, and on the independent testing data set H-200, the accuracy improves 0.68%. Moreover, the dimensions of the features we derive from the RNA sequences are only 109,109, and 119 in H. sapiens, M. musculus, and S. cerevisiae, which is much smaller than those used in the traditional algorithms. On evaluation via tenfold cross-validation and two independent testing data sets, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at https://github.com/dan111262/PseUdeep.
first_indexed 2024-12-20T22:25:09Z
format Article
id doaj.art-893710d42b024260b70e1335792ffaf9
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-20T22:25:09Z
publishDate 2021-11-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-893710d42b024260b70e1335792ffaf92022-12-21T19:24:51ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-11-011210.3389/fgene.2021.773882773882PseUdeep: RNA Pseudouridine Site Identification with Deep Learning AlgorithmJujuan Zhuang0Danyang Liu1Meng Lin2Wenjing Qiu3Wenjing Qiu4Jinyang Liu5Size Chen6Size Chen7Size Chen8College of Science, Dalian Maritime University, Dalian, ChinaCollege of Science, Dalian Maritime University, Dalian, ChinaCollege of Science, Dalian Maritime University, Dalian, ChinaElectrical and Information Engineering, Anhui University of Technology, Anhui, ChinaGeneis (Beijing) Co., Ltd., Beijing, ChinaGeneis (Beijing) Co., Ltd., Beijing, ChinaDepartment of Oncology, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, ChinaGuangdong Provincial Engineering Research Center for Esophageal Cancer Precise Therapy, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, ChinaCentral Laboratory, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, ChinaBackground: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming.Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of three species: H. sapiens, S. cerevisiae, and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern, and position-specific nucleotide composition. The three feature matrices are convoluted twice and fed into the capsule neural network and bidirectional gated recurrent unit network with a self-attention mechanism for classification.Conclusion: Compared with other state-of-the-art methods, our model gets the highest accuracy of the prediction on the independent testing data set S-200; the accuracy improves 12.38%, and on the independent testing data set H-200, the accuracy improves 0.68%. Moreover, the dimensions of the features we derive from the RNA sequences are only 109,109, and 119 in H. sapiens, M. musculus, and S. cerevisiae, which is much smaller than those used in the traditional algorithms. On evaluation via tenfold cross-validation and two independent testing data sets, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at https://github.com/dan111262/PseUdeep.https://www.frontiersin.org/articles/10.3389/fgene.2021.773882/fullRNA modificationpseudouridine site predictionfeature extractiondeep learningcapsule network
spellingShingle Jujuan Zhuang
Danyang Liu
Meng Lin
Wenjing Qiu
Wenjing Qiu
Jinyang Liu
Size Chen
Size Chen
Size Chen
PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
Frontiers in Genetics
RNA modification
pseudouridine site prediction
feature extraction
deep learning
capsule network
title PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_full PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_fullStr PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_full_unstemmed PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_short PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
title_sort pseudeep rna pseudouridine site identification with deep learning algorithm
topic RNA modification
pseudouridine site prediction
feature extraction
deep learning
capsule network
url https://www.frontiersin.org/articles/10.3389/fgene.2021.773882/full
work_keys_str_mv AT jujuanzhuang pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT danyangliu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT menglin pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT wenjingqiu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT wenjingqiu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT jinyangliu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT sizechen pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT sizechen pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm
AT sizechen pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm