A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes

DNA N6-methyladenine (6mA) is an epigenetic modification, which is involved in many biological regulation processes like DNA replication, DNA repair, transcription, and gene expression regulation. The widespread presence of this 6mA modification in eukaryotes has been unclear until recently. Studyin...

Full description

Bibliographic Details
Main Authors: Lokuthota Hewage Roland, Champi Thusangi Wannige
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9203883/
_version_ 1818663243533516800
author Lokuthota Hewage Roland
Champi Thusangi Wannige
author_facet Lokuthota Hewage Roland
Champi Thusangi Wannige
author_sort Lokuthota Hewage Roland
collection DOAJ
description DNA N6-methyladenine (6mA) is an epigenetic modification, which is involved in many biological regulation processes like DNA replication, DNA repair, transcription, and gene expression regulation. The widespread presence of this 6mA modification in eukaryotes has been unclear until recently. Studying the genome-wide distribution of 6mA can provide a deeper understanding of the epigenetic modification process and the biological processes it involves. Existing experimental techniques are time-consuming and computational machine learning methods have room for performance improvement. DNA N6-methyladenine prediction in eukaryotic cross-species shows low performance. Hence, there is a need for a more accurate, time-efficient method to predict the distribution of 6mA sites in eukaryotes. Since deep learning architectures have shown higher accuracy, we develop a customized VGG16 architecture-based model named 6mAVGG using convolution neural networks for the prediction of DNA 6mA sites in eukaryotes. We introduce a novel 3-dimensional encoding mechanism extending the one-hot encoding method to support the input of the VGG16 model. Specifically, the 10-fold cross-validation on the benchmark datasets for the proposed model achieves higher accuracies of 98.01%, 97.44%, 99.56% respectively for cross-species, Rice, and M. musculus genomes. The proposed model outperforms existing tools for the prediction of 6mA sites and has enhanced accuracies by 2.88%, 4.2%, 0.9% respectively for cross-species, Rice, and M. musculus genomes compared to the state of the art method SNNRice6mA. The model trained with benchmark data predicts 6mA sites of other species ArabidopsisThaliana, RosaChinensis, Drosophila, and Yeast with prediction accuracy over 70%. Thus, this model can be used for the genome-wide prediction of 6mA sites in eukaryotes.
first_indexed 2024-12-17T05:13:45Z
format Article
id doaj.art-d21a046c5fb049abac0ee4ec720f89f9
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-17T05:13:45Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-d21a046c5fb049abac0ee4ec720f89f92022-12-21T22:02:11ZengIEEEIEEE Access2169-35362020-01-01817553517554510.1109/ACCESS.2020.30259909203883A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in EukaryotesLokuthota Hewage Roland0Champi Thusangi Wannige1https://orcid.org/0000-0003-1659-3245Department of Computer Science, University of Ruhuna, Matara, Sri LankaDepartment of Computer Science, University of Ruhuna, Matara, Sri LankaDNA N6-methyladenine (6mA) is an epigenetic modification, which is involved in many biological regulation processes like DNA replication, DNA repair, transcription, and gene expression regulation. The widespread presence of this 6mA modification in eukaryotes has been unclear until recently. Studying the genome-wide distribution of 6mA can provide a deeper understanding of the epigenetic modification process and the biological processes it involves. Existing experimental techniques are time-consuming and computational machine learning methods have room for performance improvement. DNA N6-methyladenine prediction in eukaryotic cross-species shows low performance. Hence, there is a need for a more accurate, time-efficient method to predict the distribution of 6mA sites in eukaryotes. Since deep learning architectures have shown higher accuracy, we develop a customized VGG16 architecture-based model named 6mAVGG using convolution neural networks for the prediction of DNA 6mA sites in eukaryotes. We introduce a novel 3-dimensional encoding mechanism extending the one-hot encoding method to support the input of the VGG16 model. Specifically, the 10-fold cross-validation on the benchmark datasets for the proposed model achieves higher accuracies of 98.01%, 97.44%, 99.56% respectively for cross-species, Rice, and M. musculus genomes. The proposed model outperforms existing tools for the prediction of 6mA sites and has enhanced accuracies by 2.88%, 4.2%, 0.9% respectively for cross-species, Rice, and M. musculus genomes compared to the state of the art method SNNRice6mA. The model trained with benchmark data predicts 6mA sites of other species ArabidopsisThaliana, RosaChinensis, Drosophila, and Yeast with prediction accuracy over 70%. Thus, this model can be used for the genome-wide prediction of 6mA sites in eukaryotes.https://ieeexplore.ieee.org/document/9203883/DNA N6-methyladeninesequence analysisdeep learningeukaryotesDNA sequence encoding method
spellingShingle Lokuthota Hewage Roland
Champi Thusangi Wannige
A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes
IEEE Access
DNA N6-methyladenine
sequence analysis
deep learning
eukaryotes
DNA sequence encoding method
title A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes
title_full A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes
title_fullStr A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes
title_full_unstemmed A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes
title_short A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes
title_sort deep learning model for predicting dna n6 methyladenine 6ma sites in eukaryotes
topic DNA N6-methyladenine
sequence analysis
deep learning
eukaryotes
DNA sequence encoding method
url https://ieeexplore.ieee.org/document/9203883/
work_keys_str_mv AT lokuthotahewageroland adeeplearningmodelforpredictingdnan6methyladenine6masitesineukaryotes
AT champithusangiwannige adeeplearningmodelforpredictingdnan6methyladenine6masitesineukaryotes
AT lokuthotahewageroland deeplearningmodelforpredictingdnan6methyladenine6masitesineukaryotes
AT champithusangiwannige deeplearningmodelforpredictingdnan6methyladenine6masitesineukaryotes