Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties

Dihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional role...

Full description

Bibliographic Details
Main Authors:	Huan Zhu, Chun-Yan Ao, Yi-Jie Ding, Hong-Xia Hao, Liang Yu
Format:	Article
Language:	English
Published:	MDPI AG 2022-03-01
Series:	International Journal of Molecular Sciences
Subjects:	dihydrouridine random forest nucleotide chemical properties prediction oversample
Online Access:	https://www.mdpi.com/1422-0067/23/6/3044

_version_	1797470973537026048
author	Huan Zhu Chun-Yan Ao Yi-Jie Ding Hong-Xia Hao Liang Yu
author_facet	Huan Zhu Chun-Yan Ao Yi-Jie Ding Hong-Xia Hao Liang Yu
author_sort	Huan Zhu
collection	DOAJ
description	Dihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional roles. Traditional experimental techniques to identify D are laborious and time-consuming. In addition, there are few computational tools for such analysis. In this study, we utilized eleven sequence-derived feature extraction methods and implemented five popular machine algorithms to identify an optimal model. During data preprocessing, data were partitioned for training and testing. Oversampling was also adopted to reduce the effect of the imbalance between positive and negative samples. The best-performing model was obtained through a combination of random forest and nucleotide chemical property modeling. The optimized model presented high sensitivity and specificity values of 0.9688 and 0.9706 in independent tests, respectively. Our proposed model surpassed published tools in independent tests. Furthermore, a series of validations across several aspects was conducted in order to demonstrate the robustness and reliability of our model.
first_indexed	2024-03-09T19:42:59Z
format	Article
id	doaj.art-049d1eb96de041688742650d94292743
institution	Directory Open Access Journal
issn	1661-6596 1422-0067
language	English
last_indexed	2024-03-09T19:42:59Z
publishDate	2022-03-01
publisher	MDPI AG
record_format	Article
series	International Journal of Molecular Sciences
spelling	doaj.art-049d1eb96de041688742650d942927432023-11-24T01:31:06ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672022-03-01236304410.3390/ijms23063044Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical PropertiesHuan Zhu0Chun-Yan Ao1Yi-Jie Ding2Hong-Xia Hao3Liang Yu4School of Computer Science and Technology, Xidian University, Xi’an 710071, ChinaSchool of Computer Science and Technology, Xidian University, Xi’an 710071, ChinaYangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, ChinaSchool of Computer Science and Technology, Xidian University, Xi’an 710071, ChinaSchool of Computer Science and Technology, Xidian University, Xi’an 710071, ChinaDihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional roles. Traditional experimental techniques to identify D are laborious and time-consuming. In addition, there are few computational tools for such analysis. In this study, we utilized eleven sequence-derived feature extraction methods and implemented five popular machine algorithms to identify an optimal model. During data preprocessing, data were partitioned for training and testing. Oversampling was also adopted to reduce the effect of the imbalance between positive and negative samples. The best-performing model was obtained through a combination of random forest and nucleotide chemical property modeling. The optimized model presented high sensitivity and specificity values of 0.9688 and 0.9706 in independent tests, respectively. Our proposed model surpassed published tools in independent tests. Furthermore, a series of validations across several aspects was conducted in order to demonstrate the robustness and reliability of our model.https://www.mdpi.com/1422-0067/23/6/3044dihydrouridinerandom forestnucleotide chemical propertiespredictionoversample
spellingShingle	Huan Zhu Chun-Yan Ao Yi-Jie Ding Hong-Xia Hao Liang Yu Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties International Journal of Molecular Sciences dihydrouridine random forest nucleotide chemical properties prediction oversample
title	Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties
title_full	Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties
title_fullStr	Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties
title_full_unstemmed	Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties
title_short	Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties
title_sort	identification of d modification sites using a random forest model based on nucleotide chemical properties
topic	dihydrouridine random forest nucleotide chemical properties prediction oversample
url	https://www.mdpi.com/1422-0067/23/6/3044
work_keys_str_mv	AT huanzhu identificationofdmodificationsitesusingarandomforestmodelbasedonnucleotidechemicalproperties AT chunyanao identificationofdmodificationsitesusingarandomforestmodelbasedonnucleotidechemicalproperties AT yijieding identificationofdmodificationsitesusingarandomforestmodelbasedonnucleotidechemicalproperties AT hongxiahao identificationofdmodificationsitesusingarandomforestmodelbasedonnucleotidechemicalproperties AT liangyu identificationofdmodificationsitesusingarandomforestmodelbasedonnucleotidechemicalproperties

Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties

Similar Items