Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
SNAREs (soluble N-ethylmaleimide-sensitive factor activating protein receptors) are a group of proteins that are crucial for membrane fusion and exocytosis of neurotransmitters from the cell. They play an important role in a broad range of cell processes, including cell growth, cytokinesis, and syna...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2019-12-01
|
Series: | Frontiers in Physiology |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fphys.2019.01501/full |
_version_ | 1818194368904822784 |
---|---|
author | Nguyen Quoc Khanh Le Tuan-Tu Huynh Tuan-Tu Huynh |
author_facet | Nguyen Quoc Khanh Le Tuan-Tu Huynh Tuan-Tu Huynh |
author_sort | Nguyen Quoc Khanh Le |
collection | DOAJ |
description | SNAREs (soluble N-ethylmaleimide-sensitive factor activating protein receptors) are a group of proteins that are crucial for membrane fusion and exocytosis of neurotransmitters from the cell. They play an important role in a broad range of cell processes, including cell growth, cytokinesis, and synaptic transmission, to promote cell membrane integration in eukaryotes. Many studies determined that SNARE proteins have been associated with a lot of human diseases, especially in cancer. Therefore, identifying their functions is a challenging problem for scientists to better understand the cancer disease as well as design the drug targets for treatment. We described each protein sequence based on the amino acid embeddings using fastText, which is a natural language processing model performing well in its field. Because each protein sequence is similar to a sentence with different words, applying language model into protein sequence is challenging and promising. After generating, the amino acid embedding features were fed into a deep learning algorithm for prediction. Our model which combines fastText model and deep convolutional neural networks could identify SNARE proteins with an independent test accuracy of 92.8%, sensitivity of 88.5%, specificity of 97%, and Matthews correlation coefficient (MCC) of 0.86. Our performance results were superior to the state-of-the-art predictor (SNARE-CNN). We suggest this study as a reliable method for biologists for SNARE identification and it serves a basis for applying fastText word embedding model into bioinformatics, especially in protein sequencing prediction. |
first_indexed | 2024-12-12T01:01:11Z |
format | Article |
id | doaj.art-6d38d3487cf14a4f9e8ef1862c6085f2 |
institution | Directory Open Access Journal |
issn | 1664-042X |
language | English |
last_indexed | 2024-12-12T01:01:11Z |
publishDate | 2019-12-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Physiology |
spelling | doaj.art-6d38d3487cf14a4f9e8ef1862c6085f22022-12-22T00:43:44ZengFrontiers Media S.A.Frontiers in Physiology1664-042X2019-12-011010.3389/fphys.2019.01501462575Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding RepresentationNguyen Quoc Khanh Le0Tuan-Tu Huynh1Tuan-Tu Huynh2Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taipei, TaiwanDepartment of Electrical Electronic and Mechanical Engineering, Lac Hong University, Bien Hoa, VietnamDepartment of Electrical Engineering, Yuan Ze University, Taoyuan, TaiwanSNAREs (soluble N-ethylmaleimide-sensitive factor activating protein receptors) are a group of proteins that are crucial for membrane fusion and exocytosis of neurotransmitters from the cell. They play an important role in a broad range of cell processes, including cell growth, cytokinesis, and synaptic transmission, to promote cell membrane integration in eukaryotes. Many studies determined that SNARE proteins have been associated with a lot of human diseases, especially in cancer. Therefore, identifying their functions is a challenging problem for scientists to better understand the cancer disease as well as design the drug targets for treatment. We described each protein sequence based on the amino acid embeddings using fastText, which is a natural language processing model performing well in its field. Because each protein sequence is similar to a sentence with different words, applying language model into protein sequence is challenging and promising. After generating, the amino acid embedding features were fed into a deep learning algorithm for prediction. Our model which combines fastText model and deep convolutional neural networks could identify SNARE proteins with an independent test accuracy of 92.8%, sensitivity of 88.5%, specificity of 97%, and Matthews correlation coefficient (MCC) of 0.86. Our performance results were superior to the state-of-the-art predictor (SNARE-CNN). We suggest this study as a reliable method for biologists for SNARE identification and it serves a basis for applying fastText word embedding model into bioinformatics, especially in protein sequencing prediction.https://www.frontiersin.org/article/10.3389/fphys.2019.01501/fullSNARE proteinsdeep learningconvolutional neural networksword embeddingskip-gram |
spellingShingle | Nguyen Quoc Khanh Le Tuan-Tu Huynh Tuan-Tu Huynh Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation Frontiers in Physiology SNARE proteins deep learning convolutional neural networks word embedding skip-gram |
title | Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation |
title_full | Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation |
title_fullStr | Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation |
title_full_unstemmed | Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation |
title_short | Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation |
title_sort | identifying snares by incorporating deep learning architecture and amino acid embedding representation |
topic | SNARE proteins deep learning convolutional neural networks word embedding skip-gram |
url | https://www.frontiersin.org/article/10.3389/fphys.2019.01501/full |
work_keys_str_mv | AT nguyenquockhanhle identifyingsnaresbyincorporatingdeeplearningarchitectureandaminoacidembeddingrepresentation AT tuantuhuynh identifyingsnaresbyincorporatingdeeplearningarchitectureandaminoacidembeddingrepresentation AT tuantuhuynh identifyingsnaresbyincorporatingdeeplearningarchitectureandaminoacidembeddingrepresentation |