Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation

SNAREs (soluble N-ethylmaleimide-sensitive factor activating protein receptors) are a group of proteins that are crucial for membrane fusion and exocytosis of neurotransmitters from the cell. They play an important role in a broad range of cell processes, including cell growth, cytokinesis, and syna...

Full description

Bibliographic Details
Main Authors: Nguyen Quoc Khanh Le, Tuan-Tu Huynh
Format: Article
Language:English
Published: Frontiers Media S.A. 2019-12-01
Series:Frontiers in Physiology
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fphys.2019.01501/full
_version_ 1818194368904822784
author Nguyen Quoc Khanh Le
Tuan-Tu Huynh
Tuan-Tu Huynh
author_facet Nguyen Quoc Khanh Le
Tuan-Tu Huynh
Tuan-Tu Huynh
author_sort Nguyen Quoc Khanh Le
collection DOAJ
description SNAREs (soluble N-ethylmaleimide-sensitive factor activating protein receptors) are a group of proteins that are crucial for membrane fusion and exocytosis of neurotransmitters from the cell. They play an important role in a broad range of cell processes, including cell growth, cytokinesis, and synaptic transmission, to promote cell membrane integration in eukaryotes. Many studies determined that SNARE proteins have been associated with a lot of human diseases, especially in cancer. Therefore, identifying their functions is a challenging problem for scientists to better understand the cancer disease as well as design the drug targets for treatment. We described each protein sequence based on the amino acid embeddings using fastText, which is a natural language processing model performing well in its field. Because each protein sequence is similar to a sentence with different words, applying language model into protein sequence is challenging and promising. After generating, the amino acid embedding features were fed into a deep learning algorithm for prediction. Our model which combines fastText model and deep convolutional neural networks could identify SNARE proteins with an independent test accuracy of 92.8%, sensitivity of 88.5%, specificity of 97%, and Matthews correlation coefficient (MCC) of 0.86. Our performance results were superior to the state-of-the-art predictor (SNARE-CNN). We suggest this study as a reliable method for biologists for SNARE identification and it serves a basis for applying fastText word embedding model into bioinformatics, especially in protein sequencing prediction.
first_indexed 2024-12-12T01:01:11Z
format Article
id doaj.art-6d38d3487cf14a4f9e8ef1862c6085f2
institution Directory Open Access Journal
issn 1664-042X
language English
last_indexed 2024-12-12T01:01:11Z
publishDate 2019-12-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Physiology
spelling doaj.art-6d38d3487cf14a4f9e8ef1862c6085f22022-12-22T00:43:44ZengFrontiers Media S.A.Frontiers in Physiology1664-042X2019-12-011010.3389/fphys.2019.01501462575Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding RepresentationNguyen Quoc Khanh Le0Tuan-Tu Huynh1Tuan-Tu Huynh2Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taipei, TaiwanDepartment of Electrical Electronic and Mechanical Engineering, Lac Hong University, Bien Hoa, VietnamDepartment of Electrical Engineering, Yuan Ze University, Taoyuan, TaiwanSNAREs (soluble N-ethylmaleimide-sensitive factor activating protein receptors) are a group of proteins that are crucial for membrane fusion and exocytosis of neurotransmitters from the cell. They play an important role in a broad range of cell processes, including cell growth, cytokinesis, and synaptic transmission, to promote cell membrane integration in eukaryotes. Many studies determined that SNARE proteins have been associated with a lot of human diseases, especially in cancer. Therefore, identifying their functions is a challenging problem for scientists to better understand the cancer disease as well as design the drug targets for treatment. We described each protein sequence based on the amino acid embeddings using fastText, which is a natural language processing model performing well in its field. Because each protein sequence is similar to a sentence with different words, applying language model into protein sequence is challenging and promising. After generating, the amino acid embedding features were fed into a deep learning algorithm for prediction. Our model which combines fastText model and deep convolutional neural networks could identify SNARE proteins with an independent test accuracy of 92.8%, sensitivity of 88.5%, specificity of 97%, and Matthews correlation coefficient (MCC) of 0.86. Our performance results were superior to the state-of-the-art predictor (SNARE-CNN). We suggest this study as a reliable method for biologists for SNARE identification and it serves a basis for applying fastText word embedding model into bioinformatics, especially in protein sequencing prediction.https://www.frontiersin.org/article/10.3389/fphys.2019.01501/fullSNARE proteinsdeep learningconvolutional neural networksword embeddingskip-gram
spellingShingle Nguyen Quoc Khanh Le
Tuan-Tu Huynh
Tuan-Tu Huynh
Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
Frontiers in Physiology
SNARE proteins
deep learning
convolutional neural networks
word embedding
skip-gram
title Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
title_full Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
title_fullStr Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
title_full_unstemmed Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
title_short Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
title_sort identifying snares by incorporating deep learning architecture and amino acid embedding representation
topic SNARE proteins
deep learning
convolutional neural networks
word embedding
skip-gram
url https://www.frontiersin.org/article/10.3389/fphys.2019.01501/full
work_keys_str_mv AT nguyenquockhanhle identifyingsnaresbyincorporatingdeeplearningarchitectureandaminoacidembeddingrepresentation
AT tuantuhuynh identifyingsnaresbyincorporatingdeeplearningarchitectureandaminoacidembeddingrepresentation
AT tuantuhuynh identifyingsnaresbyincorporatingdeeplearningarchitectureandaminoacidembeddingrepresentation