Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC bi...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2019-06-01
|
Series: | Molecular Therapy: Nucleic Acids |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2162253119300927 |
_version_ | 1819128204748652544 |
---|---|
author | Balachandran Manavalan Shaherin Basith Tae Hwan Shin Leyi Wei Gwang Lee |
author_facet | Balachandran Manavalan Shaherin Basith Tae Hwan Shin Leyi Wei Gwang Lee |
author_sort | Balachandran Manavalan |
collection | DOAJ |
description | DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC biological functions and mechanisms. Hence, it is necessary to develop in silico approaches for efficient and high-throughput 4mC site identification. Although some bioinformatic tools have been developed in this regard, their prediction accuracy and generalizability require improvement to optimize their usability in practical applications. For this purpose, we here proposed Meta-4mCpred, a meta-predictor for 4mC site prediction. In Meta-4mCpred, we employed a feature representation learning scheme and generated 56 probabilistic features based on four different machine-learning algorithms and seven feature encodings covering diverse sequence information, including compositional, physicochemical, and position-specific information. Subsequently, the probabilistic features were used as an input to support vector machine and developed a final meta-predictor. To the best of our knowledge, this is the first meta-predictor for 4mC site prediction. Cross-validation results show that Meta-4mCpred achieved an overall average accuracy of 84.2% from six different species, which is ∼2%–4% higher than those attainable using the state-of-the-art predictors. Furthermore, Meta-4mCpred achieved an overall average accuracy of 86% on independent datasets evaluation, which is over 4% higher than those yielded by the state-of-the-art predictors. The user-friendly webserver employed to implement the proposed Meta-4mCpred is freely accessible at http://thegleelab.org/Meta-4mCpred. Keywords: DNA N4-methylcytosine, feature representation learning, probabilistic features, support vector machine, meta-predictor |
first_indexed | 2024-12-22T08:24:07Z |
format | Article |
id | doaj.art-7a2c98b827304e76a0fb01e7493c5a1b |
institution | Directory Open Access Journal |
issn | 2162-2531 |
language | English |
last_indexed | 2024-12-22T08:24:07Z |
publishDate | 2019-06-01 |
publisher | Elsevier |
record_format | Article |
series | Molecular Therapy: Nucleic Acids |
spelling | doaj.art-7a2c98b827304e76a0fb01e7493c5a1b2022-12-21T18:32:40ZengElsevierMolecular Therapy: Nucleic Acids2162-25312019-06-0116733744Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature RepresentationBalachandran Manavalan0Shaherin Basith1Tae Hwan Shin2Leyi Wei3Gwang Lee4Department of Physiology, Ajou University School of Medicine, Suwon, Republic of KoreaDepartment of Physiology, Ajou University School of Medicine, Suwon, Republic of KoreaDepartment of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea; Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of KoreaSchool of Computer Science and Technology, Tianjin University, China; Corresponding author: Leyi Wei, School of Computer Science and Technology, Tianjin University, China.Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea; Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea; Corresponding author: Gwang Lee, Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC biological functions and mechanisms. Hence, it is necessary to develop in silico approaches for efficient and high-throughput 4mC site identification. Although some bioinformatic tools have been developed in this regard, their prediction accuracy and generalizability require improvement to optimize their usability in practical applications. For this purpose, we here proposed Meta-4mCpred, a meta-predictor for 4mC site prediction. In Meta-4mCpred, we employed a feature representation learning scheme and generated 56 probabilistic features based on four different machine-learning algorithms and seven feature encodings covering diverse sequence information, including compositional, physicochemical, and position-specific information. Subsequently, the probabilistic features were used as an input to support vector machine and developed a final meta-predictor. To the best of our knowledge, this is the first meta-predictor for 4mC site prediction. Cross-validation results show that Meta-4mCpred achieved an overall average accuracy of 84.2% from six different species, which is ∼2%–4% higher than those attainable using the state-of-the-art predictors. Furthermore, Meta-4mCpred achieved an overall average accuracy of 86% on independent datasets evaluation, which is over 4% higher than those yielded by the state-of-the-art predictors. The user-friendly webserver employed to implement the proposed Meta-4mCpred is freely accessible at http://thegleelab.org/Meta-4mCpred. Keywords: DNA N4-methylcytosine, feature representation learning, probabilistic features, support vector machine, meta-predictorhttp://www.sciencedirect.com/science/article/pii/S2162253119300927 |
spellingShingle | Balachandran Manavalan Shaherin Basith Tae Hwan Shin Leyi Wei Gwang Lee Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation Molecular Therapy: Nucleic Acids |
title | Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation |
title_full | Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation |
title_fullStr | Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation |
title_full_unstemmed | Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation |
title_short | Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation |
title_sort | meta 4mcpred a sequence based meta predictor for accurate dna 4mc site prediction using effective feature representation |
url | http://www.sciencedirect.com/science/article/pii/S2162253119300927 |
work_keys_str_mv | AT balachandranmanavalan meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation AT shaherinbasith meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation AT taehwanshin meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation AT leyiwei meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation AT gwanglee meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation |