PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm
Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desir...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-11-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fgene.2021.773882/full |
_version_ | 1818999924686061568 |
---|---|
author | Jujuan Zhuang Danyang Liu Meng Lin Wenjing Qiu Wenjing Qiu Jinyang Liu Size Chen Size Chen Size Chen |
author_facet | Jujuan Zhuang Danyang Liu Meng Lin Wenjing Qiu Wenjing Qiu Jinyang Liu Size Chen Size Chen Size Chen |
author_sort | Jujuan Zhuang |
collection | DOAJ |
description | Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming.Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of three species: H. sapiens, S. cerevisiae, and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern, and position-specific nucleotide composition. The three feature matrices are convoluted twice and fed into the capsule neural network and bidirectional gated recurrent unit network with a self-attention mechanism for classification.Conclusion: Compared with other state-of-the-art methods, our model gets the highest accuracy of the prediction on the independent testing data set S-200; the accuracy improves 12.38%, and on the independent testing data set H-200, the accuracy improves 0.68%. Moreover, the dimensions of the features we derive from the RNA sequences are only 109,109, and 119 in H. sapiens, M. musculus, and S. cerevisiae, which is much smaller than those used in the traditional algorithms. On evaluation via tenfold cross-validation and two independent testing data sets, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at https://github.com/dan111262/PseUdeep. |
first_indexed | 2024-12-20T22:25:09Z |
format | Article |
id | doaj.art-893710d42b024260b70e1335792ffaf9 |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-12-20T22:25:09Z |
publishDate | 2021-11-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-893710d42b024260b70e1335792ffaf92022-12-21T19:24:51ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-11-011210.3389/fgene.2021.773882773882PseUdeep: RNA Pseudouridine Site Identification with Deep Learning AlgorithmJujuan Zhuang0Danyang Liu1Meng Lin2Wenjing Qiu3Wenjing Qiu4Jinyang Liu5Size Chen6Size Chen7Size Chen8College of Science, Dalian Maritime University, Dalian, ChinaCollege of Science, Dalian Maritime University, Dalian, ChinaCollege of Science, Dalian Maritime University, Dalian, ChinaElectrical and Information Engineering, Anhui University of Technology, Anhui, ChinaGeneis (Beijing) Co., Ltd., Beijing, ChinaGeneis (Beijing) Co., Ltd., Beijing, ChinaDepartment of Oncology, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, ChinaGuangdong Provincial Engineering Research Center for Esophageal Cancer Precise Therapy, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, ChinaCentral Laboratory, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, ChinaBackground: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming.Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of three species: H. sapiens, S. cerevisiae, and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern, and position-specific nucleotide composition. The three feature matrices are convoluted twice and fed into the capsule neural network and bidirectional gated recurrent unit network with a self-attention mechanism for classification.Conclusion: Compared with other state-of-the-art methods, our model gets the highest accuracy of the prediction on the independent testing data set S-200; the accuracy improves 12.38%, and on the independent testing data set H-200, the accuracy improves 0.68%. Moreover, the dimensions of the features we derive from the RNA sequences are only 109,109, and 119 in H. sapiens, M. musculus, and S. cerevisiae, which is much smaller than those used in the traditional algorithms. On evaluation via tenfold cross-validation and two independent testing data sets, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at https://github.com/dan111262/PseUdeep.https://www.frontiersin.org/articles/10.3389/fgene.2021.773882/fullRNA modificationpseudouridine site predictionfeature extractiondeep learningcapsule network |
spellingShingle | Jujuan Zhuang Danyang Liu Meng Lin Wenjing Qiu Wenjing Qiu Jinyang Liu Size Chen Size Chen Size Chen PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm Frontiers in Genetics RNA modification pseudouridine site prediction feature extraction deep learning capsule network |
title | PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm |
title_full | PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm |
title_fullStr | PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm |
title_full_unstemmed | PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm |
title_short | PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm |
title_sort | pseudeep rna pseudouridine site identification with deep learning algorithm |
topic | RNA modification pseudouridine site prediction feature extraction deep learning capsule network |
url | https://www.frontiersin.org/articles/10.3389/fgene.2021.773882/full |
work_keys_str_mv | AT jujuanzhuang pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT danyangliu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT menglin pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT wenjingqiu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT wenjingqiu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT jinyangliu pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT sizechen pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT sizechen pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm AT sizechen pseudeeprnapseudouridinesiteidentificationwithdeeplearningalgorithm |