A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation
Post-transcriptional modification such as N6-methyladenosine (m6A) has a crucial role in the stability and regulation of gene expression. Therefore, the identification of m6A is highly required for understanding the functional mechanisms of biological processes. Several machine learning techniques b...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9119435/ |
_version_ | 1819276633487441920 |
---|---|
author | Waleed Alam Syed Danish Ali Hilal Tayara Kil to Chong |
author_facet | Waleed Alam Syed Danish Ali Hilal Tayara Kil to Chong |
author_sort | Waleed Alam |
collection | DOAJ |
description | Post-transcriptional modification such as N6-methyladenosine (m6A) has a crucial role in the stability and regulation of gene expression. Therefore, the identification of m6A is highly required for understanding the functional mechanisms of biological processes. Several machine learning techniques based on handy craft feature extraction methods have been proposed to facilitate the laborious work. However, due to the inefficient feature extraction, these techniques increase the computational complexity and thereby affect the identification accuracy of m6A.This paper proposes a fast and reliable predictive model for the identification of m6A sites. The proposed model is based on the convolutional neural network (CNN) which extracts the most significant features from the RNA sequences encoded by concatenating one-hot and nucleotide chemical properties. The proposed model is applied and tested on multiple species benchmark datasets and evaluated against the state-of-art predictive models. The results indicate that the proposed model achieves high accuracy of 93.6 %, 93.8 %, 85.0 % and 92.5 % on the benchmark datasets of Homo sapiens (H.sapien), Mus musculus (M.musculs), Saccharomyces cerevisiae (S.cerevisiae), and Arabidopsis thaliana (A.thaliana), respectively.The proposed model could be used to facilitate the researcher's community in m6A identification. In addition, an easy to use web server is made available at https://home.jbnu.ac.kr/NSCL/pm6acnn.htm. |
first_indexed | 2024-12-23T23:43:19Z |
format | Article |
id | doaj.art-8c83918faa0c436c9652465246c04fb1 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-23T23:43:19Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-8c83918faa0c436c9652465246c04fb12022-12-21T17:25:36ZengIEEEIEEE Access2169-35362020-01-01813820313820910.1109/ACCESS.2020.30029959119435A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features RepresentationWaleed Alam0https://orcid.org/0000-0001-8622-4985Syed Danish Ali1https://orcid.org/0000-0001-5204-5073Hilal Tayara2https://orcid.org/0000-0001-5678-3479Kil to Chong3https://orcid.org/0000-0002-1952-0001Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South KoreaDepartment of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South KoreaDepartment of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South KoreaDepartment of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South KoreaPost-transcriptional modification such as N6-methyladenosine (m6A) has a crucial role in the stability and regulation of gene expression. Therefore, the identification of m6A is highly required for understanding the functional mechanisms of biological processes. Several machine learning techniques based on handy craft feature extraction methods have been proposed to facilitate the laborious work. However, due to the inefficient feature extraction, these techniques increase the computational complexity and thereby affect the identification accuracy of m6A.This paper proposes a fast and reliable predictive model for the identification of m6A sites. The proposed model is based on the convolutional neural network (CNN) which extracts the most significant features from the RNA sequences encoded by concatenating one-hot and nucleotide chemical properties. The proposed model is applied and tested on multiple species benchmark datasets and evaluated against the state-of-art predictive models. The results indicate that the proposed model achieves high accuracy of 93.6 %, 93.8 %, 85.0 % and 92.5 % on the benchmark datasets of Homo sapiens (H.sapien), Mus musculus (M.musculs), Saccharomyces cerevisiae (S.cerevisiae), and Arabidopsis thaliana (A.thaliana), respectively.The proposed model could be used to facilitate the researcher's community in m6A identification. In addition, an easy to use web server is made available at https://home.jbnu.ac.kr/NSCL/pm6acnn.htm.https://ieeexplore.ieee.org/document/9119435/Post-transcription modificationRNA methylationsequence analysisconvolutional neural networkdeep learning |
spellingShingle | Waleed Alam Syed Danish Ali Hilal Tayara Kil to Chong A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation IEEE Access Post-transcription modification RNA methylation sequence analysis convolutional neural network deep learning |
title | A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation |
title_full | A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation |
title_fullStr | A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation |
title_full_unstemmed | A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation |
title_short | A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation |
title_sort | cnn based rna n6 methyladenosine site predictor for multiple species using heterogeneous features representation |
topic | Post-transcription modification RNA methylation sequence analysis convolutional neural network deep learning |
url | https://ieeexplore.ieee.org/document/9119435/ |
work_keys_str_mv | AT waleedalam acnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation AT syeddanishali acnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation AT hilaltayara acnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation AT kiltochong acnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation AT waleedalam cnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation AT syeddanishali cnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation AT hilaltayara cnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation AT kiltochong cnnbasedrnan6methyladenosinesitepredictorformultiplespeciesusingheterogeneousfeaturesrepresentation |