iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM

Biological sequence analysis is an important basic research work in the field of bioinformatics. With the explosive growth of data, machine learning methods play an increasingly important role in biological sequence analysis. By constructing a classifier for prediction, the input sequence feature ve...

Full description

Bibliographic Details
Main Authors:	Mingshuai Chen, Xin Zhang, Ying Ju, Qing Liu, Yijie Ding
Format:	Article
Language:	English
Published:	AIMS Press 2022-09-01
Series:	Mathematical Biosciences and Engineering
Subjects:	pseudouridine sites twin support vector machine max-relevance and min-redundancy rna
Online Access:	https://www.aimspress.com/article/doi/10.3934/mbe.2022644?viewType=HTML

_version_	1811265348800872448
author	Mingshuai Chen Xin Zhang Ying Ju Qing Liu Yijie Ding
author_facet	Mingshuai Chen Xin Zhang Ying Ju Qing Liu Yijie Ding
author_sort	Mingshuai Chen
collection	DOAJ
description	Biological sequence analysis is an important basic research work in the field of bioinformatics. With the explosive growth of data, machine learning methods play an increasingly important role in biological sequence analysis. By constructing a classifier for prediction, the input sequence feature vector is predicted and evaluated, and the knowledge of gene structure, function and evolution is obtained from a large amount of sequence information, which lays a foundation for researchers to carry out in-depth research. At present, many machine learning methods have been applied to biological sequence analysis such as RNA gene recognition and protein secondary structure prediction. As a biological sequence, RNA plays an important biological role in the encoding, decoding, regulation and expression of genes. The analysis of RNA data is currently carried out from the aspects of structure and function, including secondary structure prediction, non-coding RNA identification and functional site prediction. Pseudouridine (У) is the most widespread and rich RNA modification and has been discovered in a variety of RNAs. It is highly essential for the study of related functional mechanisms and disease diagnosis to accurately identify У sites in RNA sequences. At present, several computational approaches have been suggested as an alternative to experimental methods to detect У sites, but there is still potential for improvement in their performance. In this study, we present a model based on twin support vector machine (TWSVM) for У site identification. The model combines a variety of feature representation techniques and uses the max-relevance and min-redundancy methods to obtain the optimum feature subset for training. The independent testing accuracy is improved by 3.4% in comparison to current advanced У site predictors. The outcomes demonstrate that our model has better generalization performance and improves the accuracy of У site identification. iPseU-TWSVM can be a helpful tool to identify У sites.
first_indexed	2024-04-12T20:20:56Z
format	Article
id	doaj.art-e5edf62019494d7fb4384055a021070e
institution	Directory Open Access Journal
issn	1551-0018
language	English
last_indexed	2024-04-12T20:20:56Z
publishDate	2022-09-01
publisher	AIMS Press
record_format	Article
series	Mathematical Biosciences and Engineering
spelling	doaj.art-e5edf62019494d7fb4384055a021070e2022-12-22T03:17:59ZengAIMS PressMathematical Biosciences and Engineering1551-00182022-09-011912138291385010.3934/mbe.2022644iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVMMingshuai Chen0Xin Zhang 1Ying Ju2Qing Liu3Yijie Ding41. Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China 2. Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China3. Beidahuang Industry Group General Hospital, Harbin, China4. School of Informatics, Xiamen University, Xiamen, China5. Department of Anesthesiology, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, China2. Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, ChinaBiological sequence analysis is an important basic research work in the field of bioinformatics. With the explosive growth of data, machine learning methods play an increasingly important role in biological sequence analysis. By constructing a classifier for prediction, the input sequence feature vector is predicted and evaluated, and the knowledge of gene structure, function and evolution is obtained from a large amount of sequence information, which lays a foundation for researchers to carry out in-depth research. At present, many machine learning methods have been applied to biological sequence analysis such as RNA gene recognition and protein secondary structure prediction. As a biological sequence, RNA plays an important biological role in the encoding, decoding, regulation and expression of genes. The analysis of RNA data is currently carried out from the aspects of structure and function, including secondary structure prediction, non-coding RNA identification and functional site prediction. Pseudouridine (У) is the most widespread and rich RNA modification and has been discovered in a variety of RNAs. It is highly essential for the study of related functional mechanisms and disease diagnosis to accurately identify У sites in RNA sequences. At present, several computational approaches have been suggested as an alternative to experimental methods to detect У sites, but there is still potential for improvement in their performance. In this study, we present a model based on twin support vector machine (TWSVM) for У site identification. The model combines a variety of feature representation techniques and uses the max-relevance and min-redundancy methods to obtain the optimum feature subset for training. The independent testing accuracy is improved by 3.4% in comparison to current advanced У site predictors. The outcomes demonstrate that our model has better generalization performance and improves the accuracy of У site identification. iPseU-TWSVM can be a helpful tool to identify У sites.https://www.aimspress.com/article/doi/10.3934/mbe.2022644?viewType=HTMLpseudouridine sitestwin support vector machinemax-relevance and min-redundancyrna
spellingShingle	Mingshuai Chen Xin Zhang Ying Ju Qing Liu Yijie Ding iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM Mathematical Biosciences and Engineering pseudouridine sites twin support vector machine max-relevance and min-redundancy rna
title	iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM
title_full	iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM
title_fullStr	iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM
title_full_unstemmed	iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM
title_short	iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM
title_sort	ipseu twsvm identification of rna pseudouridine sites based on twsvm
topic	pseudouridine sites twin support vector machine max-relevance and min-redundancy rna
url	https://www.aimspress.com/article/doi/10.3934/mbe.2022644?viewType=HTML
work_keys_str_mv	AT mingshuaichen ipseutwsvmidentificationofrnapseudouridinesitesbasedontwsvm AT xinzhang ipseutwsvmidentificationofrnapseudouridinesitesbasedontwsvm AT yingju ipseutwsvmidentificationofrnapseudouridinesitesbasedontwsvm AT qingliu ipseutwsvmidentificationofrnapseudouridinesitesbasedontwsvm AT yijieding ipseutwsvmidentificationofrnapseudouridinesitesbasedontwsvm

iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM

Similar Items