A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation

Post-transcriptional modification such as N6-methyladenosine (m6A) has a crucial role in the stability and regulation of gene expression. Therefore, the identification of m6A is highly required for understanding the functional mechanisms of biological processes. Several machine learning techniques b...

Full description

Bibliographic Details
Main Authors: Waleed Alam, Syed Danish Ali, Hilal Tayara, Kil to Chong
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9119435/
_version_ 1819276633487441920
author Waleed Alam
Syed Danish Ali
Hilal Tayara
Kil to Chong
author_facet Waleed Alam
Syed Danish Ali
Hilal Tayara
Kil to Chong
author_sort Waleed Alam
collection DOAJ
description Post-transcriptional modification such as N6-methyladenosine (m6A) has a crucial role in the stability and regulation of gene expression. Therefore, the identification of m6A is highly required for understanding the functional mechanisms of biological processes. Several machine learning techniques based on handy craft feature extraction methods have been proposed to facilitate the laborious work. However, due to the inefficient feature extraction, these techniques increase the computational complexity and thereby affect the identification accuracy of m6A.This paper proposes a fast and reliable predictive model for the identification of m6A sites. The proposed model is based on the convolutional neural network (CNN) which extracts the most significant features from the RNA sequences encoded by concatenating one-hot and nucleotide chemical properties. The proposed model is applied and tested on multiple species benchmark datasets and evaluated against the state-of-art predictive models. The results indicate that the proposed model achieves high accuracy of 93.6 %, 93.8 %, 85.0 % and 92.5 % on the benchmark datasets of Homo sapiens (H.sapien), Mus musculus (M.musculs), Saccharomyces cerevisiae (S.cerevisiae), and Arabidopsis thaliana (A.thaliana), respectively.The proposed model could be used to facilitate the researcher's community in m6A identification. In addition, an easy to use web server is made available at https://home.jbnu.ac.kr/NSCL/pm6acnn.htm.
first_indexed 2024-12-23T23:43:19Z
format Article
id doaj.art-8c83918faa0c436c9652465246c04fb1
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-23T23:43:19Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-8c83918faa0c436c9652465246c04fb12022-12-21T17:25:36ZengIEEEIEEE Access2169-35362020-01-01813820313820910.1109/ACCESS.2020.30029959119435A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features RepresentationWaleed Alam0https://orcid.org/0000-0001-8622-4985Syed Danish Ali1https://orcid.org/0000-0001-5204-5073Hilal Tayara2https://orcid.org/0000-0001-5678-3479Kil to Chong3https://orcid.org/0000-0002-1952-0001Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South KoreaDepartment of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South KoreaDepartment of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South KoreaDepartment of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South KoreaPost-transcriptional modification such as N6-methyladenosine (m6A) has a crucial role in the stability and regulation of gene expression. Therefore, the identification of m6A is highly required for understanding the functional mechanisms of biological processes. Several machine learning techniques based on handy craft feature extraction methods have been proposed to facilitate the laborious work. However, due to the inefficient feature extraction, these techniques increase the computational complexity and thereby affect the identification accuracy of m6A.This paper proposes a fast and reliable predictive model for the identification of m6A sites. The proposed model is based on the convolutional neural network (CNN) which extracts the most significant features from the RNA sequences encoded by concatenating one-hot and nucleotide chemical properties. The proposed model is applied and tested on multiple species benchmark datasets and evaluated against the state-of-art predictive models. The results indicate that the proposed model achieves high accuracy of 93.6 %, 93.8 %, 85.0 % and 92.5 % on the benchmark datasets of Homo sapiens (H.sapien), Mus musculus (M.musculs), Saccharomyces cerevisiae (S.cerevisiae), and Arabidopsis thaliana (A.thaliana), respectively.The proposed model could be used to facilitate the researcher's community in m6A identification. In addition, an easy to use web server is made available at https://home.jbnu.ac.kr/NSCL/pm6acnn.htm.https://ieeexplore.ieee.org/document/9119435/Post-transcription modificationRNA methylationsequence analysisconvolutional neural networkdeep learning
spellingShingle Waleed Alam
Syed Danish Ali
Hilal Tayara
Kil to Chong
A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation
IEEE Access
Post-transcription modification
RNA methylation
sequence analysis
convolutional neural network
deep learning
title A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation
title_full A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation
title_fullStr A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation
title_full_unstemmed A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation
title_short A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation
title_sort cnn based rna n6 methyladenosine site predictor for multiple species using heterogeneous features representation
topic Post-transcription modification
RNA methylation
sequence analysis
convolutional neural network
deep learning
url https://ieeexplore.ieee.org/document/9119435/
work_keys_str_mv AT waleedalam acnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation
AT syeddanishali acnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation
AT hilaltayara acnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation
AT kiltochong acnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation
AT waleedalam cnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation
AT syeddanishali cnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation
AT hilaltayara cnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation
AT kiltochong cnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation