A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes
DNA N6-methyladenine (6mA) is an epigenetic modification, which is involved in many biological regulation processes like DNA replication, DNA repair, transcription, and gene expression regulation. The widespread presence of this 6mA modification in eukaryotes has been unclear until recently. Studyin...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9203883/ |
_version_ | 1818663243533516800 |
---|---|
author | Lokuthota Hewage Roland Champi Thusangi Wannige |
author_facet | Lokuthota Hewage Roland Champi Thusangi Wannige |
author_sort | Lokuthota Hewage Roland |
collection | DOAJ |
description | DNA N6-methyladenine (6mA) is an epigenetic modification, which is involved in many biological regulation processes like DNA replication, DNA repair, transcription, and gene expression regulation. The widespread presence of this 6mA modification in eukaryotes has been unclear until recently. Studying the genome-wide distribution of 6mA can provide a deeper understanding of the epigenetic modification process and the biological processes it involves. Existing experimental techniques are time-consuming and computational machine learning methods have room for performance improvement. DNA N6-methyladenine prediction in eukaryotic cross-species shows low performance. Hence, there is a need for a more accurate, time-efficient method to predict the distribution of 6mA sites in eukaryotes. Since deep learning architectures have shown higher accuracy, we develop a customized VGG16 architecture-based model named 6mAVGG using convolution neural networks for the prediction of DNA 6mA sites in eukaryotes. We introduce a novel 3-dimensional encoding mechanism extending the one-hot encoding method to support the input of the VGG16 model. Specifically, the 10-fold cross-validation on the benchmark datasets for the proposed model achieves higher accuracies of 98.01%, 97.44%, 99.56% respectively for cross-species, Rice, and M. musculus genomes. The proposed model outperforms existing tools for the prediction of 6mA sites and has enhanced accuracies by 2.88%, 4.2%, 0.9% respectively for cross-species, Rice, and M. musculus genomes compared to the state of the art method SNNRice6mA. The model trained with benchmark data predicts 6mA sites of other species ArabidopsisThaliana, RosaChinensis, Drosophila, and Yeast with prediction accuracy over 70%. Thus, this model can be used for the genome-wide prediction of 6mA sites in eukaryotes. |
first_indexed | 2024-12-17T05:13:45Z |
format | Article |
id | doaj.art-d21a046c5fb049abac0ee4ec720f89f9 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-17T05:13:45Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-d21a046c5fb049abac0ee4ec720f89f92022-12-21T22:02:11ZengIEEEIEEE Access2169-35362020-01-01817553517554510.1109/ACCESS.2020.30259909203883A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in EukaryotesLokuthota Hewage Roland0Champi Thusangi Wannige1https://orcid.org/0000-0003-1659-3245Department of Computer Science, University of Ruhuna, Matara, Sri LankaDepartment of Computer Science, University of Ruhuna, Matara, Sri LankaDNA N6-methyladenine (6mA) is an epigenetic modification, which is involved in many biological regulation processes like DNA replication, DNA repair, transcription, and gene expression regulation. The widespread presence of this 6mA modification in eukaryotes has been unclear until recently. Studying the genome-wide distribution of 6mA can provide a deeper understanding of the epigenetic modification process and the biological processes it involves. Existing experimental techniques are time-consuming and computational machine learning methods have room for performance improvement. DNA N6-methyladenine prediction in eukaryotic cross-species shows low performance. Hence, there is a need for a more accurate, time-efficient method to predict the distribution of 6mA sites in eukaryotes. Since deep learning architectures have shown higher accuracy, we develop a customized VGG16 architecture-based model named 6mAVGG using convolution neural networks for the prediction of DNA 6mA sites in eukaryotes. We introduce a novel 3-dimensional encoding mechanism extending the one-hot encoding method to support the input of the VGG16 model. Specifically, the 10-fold cross-validation on the benchmark datasets for the proposed model achieves higher accuracies of 98.01%, 97.44%, 99.56% respectively for cross-species, Rice, and M. musculus genomes. The proposed model outperforms existing tools for the prediction of 6mA sites and has enhanced accuracies by 2.88%, 4.2%, 0.9% respectively for cross-species, Rice, and M. musculus genomes compared to the state of the art method SNNRice6mA. The model trained with benchmark data predicts 6mA sites of other species ArabidopsisThaliana, RosaChinensis, Drosophila, and Yeast with prediction accuracy over 70%. Thus, this model can be used for the genome-wide prediction of 6mA sites in eukaryotes.https://ieeexplore.ieee.org/document/9203883/DNA N6-methyladeninesequence analysisdeep learningeukaryotesDNA sequence encoding method |
spellingShingle | Lokuthota Hewage Roland Champi Thusangi Wannige A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes IEEE Access DNA N6-methyladenine sequence analysis deep learning eukaryotes DNA sequence encoding method |
title | A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes |
title_full | A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes |
title_fullStr | A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes |
title_full_unstemmed | A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes |
title_short | A Deep Learning Model for Predicting DNA N6-Methyladenine (6mA) Sites in Eukaryotes |
title_sort | deep learning model for predicting dna n6 methyladenine 6ma sites in eukaryotes |
topic | DNA N6-methyladenine sequence analysis deep learning eukaryotes DNA sequence encoding method |
url | https://ieeexplore.ieee.org/document/9203883/ |
work_keys_str_mv | AT lokuthotahewageroland adeeplearningmodelforpredictingdnan6methyladenine6masitesineukaryotes AT champithusangiwannige adeeplearningmodelforpredictingdnan6methyladenine6masitesineukaryotes AT lokuthotahewageroland deeplearningmodelforpredictingdnan6methyladenine6masitesineukaryotes AT champithusangiwannige deeplearningmodelforpredictingdnan6methyladenine6masitesineukaryotes |