Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties

Dihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional role...

Full description

Bibliographic Details
Main Authors: Huan Zhu, Chun-Yan Ao, Yi-Jie Ding, Hong-Xia Hao, Liang Yu
Format: Article
Language:English
Published: MDPI AG 2022-03-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/23/6/3044
_version_ 1797470973537026048
author Huan Zhu
Chun-Yan Ao
Yi-Jie Ding
Hong-Xia Hao
Liang Yu
author_facet Huan Zhu
Chun-Yan Ao
Yi-Jie Ding
Hong-Xia Hao
Liang Yu
author_sort Huan Zhu
collection DOAJ
description Dihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional roles. Traditional experimental techniques to identify D are laborious and time-consuming. In addition, there are few computational tools for such analysis. In this study, we utilized eleven sequence-derived feature extraction methods and implemented five popular machine algorithms to identify an optimal model. During data preprocessing, data were partitioned for training and testing. Oversampling was also adopted to reduce the effect of the imbalance between positive and negative samples. The best-performing model was obtained through a combination of random forest and nucleotide chemical property modeling. The optimized model presented high sensitivity and specificity values of 0.9688 and 0.9706 in independent tests, respectively. Our proposed model surpassed published tools in independent tests. Furthermore, a series of validations across several aspects was conducted in order to demonstrate the robustness and reliability of our model.
first_indexed 2024-03-09T19:42:59Z
format Article
id doaj.art-049d1eb96de041688742650d94292743
institution Directory Open Access Journal
issn 1661-6596
1422-0067
language English
last_indexed 2024-03-09T19:42:59Z
publishDate 2022-03-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-049d1eb96de041688742650d942927432023-11-24T01:31:06ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672022-03-01236304410.3390/ijms23063044Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical PropertiesHuan Zhu0Chun-Yan Ao1Yi-Jie Ding2Hong-Xia Hao3Liang Yu4School of Computer Science and Technology, Xidian University, Xi’an 710071, ChinaSchool of Computer Science and Technology, Xidian University, Xi’an 710071, ChinaYangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, ChinaSchool of Computer Science and Technology, Xidian University, Xi’an 710071, ChinaSchool of Computer Science and Technology, Xidian University, Xi’an 710071, ChinaDihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional roles. Traditional experimental techniques to identify D are laborious and time-consuming. In addition, there are few computational tools for such analysis. In this study, we utilized eleven sequence-derived feature extraction methods and implemented five popular machine algorithms to identify an optimal model. During data preprocessing, data were partitioned for training and testing. Oversampling was also adopted to reduce the effect of the imbalance between positive and negative samples. The best-performing model was obtained through a combination of random forest and nucleotide chemical property modeling. The optimized model presented high sensitivity and specificity values of 0.9688 and 0.9706 in independent tests, respectively. Our proposed model surpassed published tools in independent tests. Furthermore, a series of validations across several aspects was conducted in order to demonstrate the robustness and reliability of our model.https://www.mdpi.com/1422-0067/23/6/3044dihydrouridinerandom forestnucleotide chemical propertiespredictionoversample
spellingShingle Huan Zhu
Chun-Yan Ao
Yi-Jie Ding
Hong-Xia Hao
Liang Yu
Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties
International Journal of Molecular Sciences
dihydrouridine
random forest
nucleotide chemical properties
prediction
oversample
title Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties
title_full Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties
title_fullStr Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties
title_full_unstemmed Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties
title_short Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties
title_sort identification of d modification sites using a random forest model based on nucleotide chemical properties
topic dihydrouridine
random forest
nucleotide chemical properties
prediction
oversample
url https://www.mdpi.com/1422-0067/23/6/3044
work_keys_str_mv AT huanzhu identificationofdmodificationsitesusingarandomforestmodelbasedonnucleotidechemicalproperties
AT chunyanao identificationofdmodificationsitesusingarandomforestmodelbasedonnucleotidechemicalproperties
AT yijieding identificationofdmodificationsitesusingarandomforestmodelbasedonnucleotidechemicalproperties
AT hongxiahao identificationofdmodificationsitesusingarandomforestmodelbasedonnucleotidechemicalproperties
AT liangyu identificationofdmodificationsitesusingarandomforestmodelbasedonnucleotidechemicalproperties