Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation

DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC bi...

Full description

Bibliographic Details
Main Authors: Balachandran Manavalan, Shaherin Basith, Tae Hwan Shin, Leyi Wei, Gwang Lee
Format: Article
Language:English
Published: Elsevier 2019-06-01
Series:Molecular Therapy: Nucleic Acids
Online Access:http://www.sciencedirect.com/science/article/pii/S2162253119300927
_version_ 1819128204748652544
author Balachandran Manavalan
Shaherin Basith
Tae Hwan Shin
Leyi Wei
Gwang Lee
author_facet Balachandran Manavalan
Shaherin Basith
Tae Hwan Shin
Leyi Wei
Gwang Lee
author_sort Balachandran Manavalan
collection DOAJ
description DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC biological functions and mechanisms. Hence, it is necessary to develop in silico approaches for efficient and high-throughput 4mC site identification. Although some bioinformatic tools have been developed in this regard, their prediction accuracy and generalizability require improvement to optimize their usability in practical applications. For this purpose, we here proposed Meta-4mCpred, a meta-predictor for 4mC site prediction. In Meta-4mCpred, we employed a feature representation learning scheme and generated 56 probabilistic features based on four different machine-learning algorithms and seven feature encodings covering diverse sequence information, including compositional, physicochemical, and position-specific information. Subsequently, the probabilistic features were used as an input to support vector machine and developed a final meta-predictor. To the best of our knowledge, this is the first meta-predictor for 4mC site prediction. Cross-validation results show that Meta-4mCpred achieved an overall average accuracy of 84.2% from six different species, which is ∼2%–4% higher than those attainable using the state-of-the-art predictors. Furthermore, Meta-4mCpred achieved an overall average accuracy of 86% on independent datasets evaluation, which is over 4% higher than those yielded by the state-of-the-art predictors. The user-friendly webserver employed to implement the proposed Meta-4mCpred is freely accessible at http://thegleelab.org/Meta-4mCpred. Keywords: DNA N4-methylcytosine, feature representation learning, probabilistic features, support vector machine, meta-predictor
first_indexed 2024-12-22T08:24:07Z
format Article
id doaj.art-7a2c98b827304e76a0fb01e7493c5a1b
institution Directory Open Access Journal
issn 2162-2531
language English
last_indexed 2024-12-22T08:24:07Z
publishDate 2019-06-01
publisher Elsevier
record_format Article
series Molecular Therapy: Nucleic Acids
spelling doaj.art-7a2c98b827304e76a0fb01e7493c5a1b2022-12-21T18:32:40ZengElsevierMolecular Therapy: Nucleic Acids2162-25312019-06-0116733744Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature RepresentationBalachandran Manavalan0Shaherin Basith1Tae Hwan Shin2Leyi Wei3Gwang Lee4Department of Physiology, Ajou University School of Medicine, Suwon, Republic of KoreaDepartment of Physiology, Ajou University School of Medicine, Suwon, Republic of KoreaDepartment of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea; Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of KoreaSchool of Computer Science and Technology, Tianjin University, China; Corresponding author: Leyi Wei, School of Computer Science and Technology, Tianjin University, China.Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea; Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea; Corresponding author: Gwang Lee, Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC biological functions and mechanisms. Hence, it is necessary to develop in silico approaches for efficient and high-throughput 4mC site identification. Although some bioinformatic tools have been developed in this regard, their prediction accuracy and generalizability require improvement to optimize their usability in practical applications. For this purpose, we here proposed Meta-4mCpred, a meta-predictor for 4mC site prediction. In Meta-4mCpred, we employed a feature representation learning scheme and generated 56 probabilistic features based on four different machine-learning algorithms and seven feature encodings covering diverse sequence information, including compositional, physicochemical, and position-specific information. Subsequently, the probabilistic features were used as an input to support vector machine and developed a final meta-predictor. To the best of our knowledge, this is the first meta-predictor for 4mC site prediction. Cross-validation results show that Meta-4mCpred achieved an overall average accuracy of 84.2% from six different species, which is ∼2%–4% higher than those attainable using the state-of-the-art predictors. Furthermore, Meta-4mCpred achieved an overall average accuracy of 86% on independent datasets evaluation, which is over 4% higher than those yielded by the state-of-the-art predictors. The user-friendly webserver employed to implement the proposed Meta-4mCpred is freely accessible at http://thegleelab.org/Meta-4mCpred. Keywords: DNA N4-methylcytosine, feature representation learning, probabilistic features, support vector machine, meta-predictorhttp://www.sciencedirect.com/science/article/pii/S2162253119300927
spellingShingle Balachandran Manavalan
Shaherin Basith
Tae Hwan Shin
Leyi Wei
Gwang Lee
Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
Molecular Therapy: Nucleic Acids
title Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
title_full Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
title_fullStr Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
title_full_unstemmed Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
title_short Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
title_sort meta 4mcpred a sequence based meta predictor for accurate dna 4mc site prediction using effective feature representation
url http://www.sciencedirect.com/science/article/pii/S2162253119300927
work_keys_str_mv AT balachandranmanavalan meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation
AT shaherinbasith meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation
AT taehwanshin meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation
AT leyiwei meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation
AT gwanglee meta4mcpredasequencebasedmetapredictorforaccuratedna4mcsitepredictionusingeffectivefeaturerepresentation