NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes

Neuropeptides contain more chemical information than other classical neurotransmitters and have multiple receptor recognition sites. These characteristics allow neuropeptides to have a correspondingly higher selectivity for nerve receptors and fewer side effects. Traditional experimental methods, su...

Full description

Bibliographic Details
Main Authors: Di Liu, Zhengkui Lin, Cangzhi Jia
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-07-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2023.1226905/full
_version_ 1797770934999842816
author Di Liu
Zhengkui Lin
Cangzhi Jia
author_facet Di Liu
Zhengkui Lin
Cangzhi Jia
author_sort Di Liu
collection DOAJ
description Neuropeptides contain more chemical information than other classical neurotransmitters and have multiple receptor recognition sites. These characteristics allow neuropeptides to have a correspondingly higher selectivity for nerve receptors and fewer side effects. Traditional experimental methods, such as mass spectrometry and liquid chromatography technology, still need the support of a complete neuropeptide precursor database and the basic characteristics of neuropeptides. Incomplete neuropeptide precursor and information databases will lead to false-positives or reduce the sensitivity of recognition. In recent years, studies have proven that machine learning methods can rapidly and effectively predict neuropeptides. In this work, we have made a systematic attempt to create an ensemble tool based on four convolution neural network models. These baseline models were separately trained on one-hot encoding, AAIndex, G-gap dipeptide encoding and word2vec and integrated using Gaussian Naive Bayes (NB) to construct our predictor designated NeuroCNN_GNB. Both 5-fold cross-validation tests using benchmark datasets and independent tests showed that NeuroCNN_GNB outperformed other state-of-the-art methods. Furthermore, this novel framework provides essential interpretations that aid the understanding of model success by leveraging the powerful Shapley Additive exPlanation (SHAP) algorithm, thereby highlighting the most important features relevant for predicting neuropeptides.
first_indexed 2024-03-12T21:30:11Z
format Article
id doaj.art-02890004027c4ec199b3aa9ed3c8d4ce
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-03-12T21:30:11Z
publishDate 2023-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-02890004027c4ec199b3aa9ed3c8d4ce2023-07-27T21:35:55ZengFrontiers Media S.A.Frontiers in Genetics1664-80212023-07-011410.3389/fgene.2023.12269051226905NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive BayesDi Liu0Zhengkui Lin1Cangzhi Jia2Information Science and Technology College, Dalian Maritime University, Dalian, ChinaInformation Science and Technology College, Dalian Maritime University, Dalian, ChinaSchool of Science, Dalian Maritime University, Dalian, ChinaNeuropeptides contain more chemical information than other classical neurotransmitters and have multiple receptor recognition sites. These characteristics allow neuropeptides to have a correspondingly higher selectivity for nerve receptors and fewer side effects. Traditional experimental methods, such as mass spectrometry and liquid chromatography technology, still need the support of a complete neuropeptide precursor database and the basic characteristics of neuropeptides. Incomplete neuropeptide precursor and information databases will lead to false-positives or reduce the sensitivity of recognition. In recent years, studies have proven that machine learning methods can rapidly and effectively predict neuropeptides. In this work, we have made a systematic attempt to create an ensemble tool based on four convolution neural network models. These baseline models were separately trained on one-hot encoding, AAIndex, G-gap dipeptide encoding and word2vec and integrated using Gaussian Naive Bayes (NB) to construct our predictor designated NeuroCNN_GNB. Both 5-fold cross-validation tests using benchmark datasets and independent tests showed that NeuroCNN_GNB outperformed other state-of-the-art methods. Furthermore, this novel framework provides essential interpretations that aid the understanding of model success by leveraging the powerful Shapley Additive exPlanation (SHAP) algorithm, thereby highlighting the most important features relevant for predicting neuropeptides.https://www.frontiersin.org/articles/10.3389/fgene.2023.1226905/fullneuropeptidesword2vecone-hotstacking strategyconvolution neural network
spellingShingle Di Liu
Zhengkui Lin
Cangzhi Jia
NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes
Frontiers in Genetics
neuropeptides
word2vec
one-hot
stacking strategy
convolution neural network
title NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes
title_full NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes
title_fullStr NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes
title_full_unstemmed NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes
title_short NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes
title_sort neurocnn gnb an ensemble model to predict neuropeptides based on a convolution neural network and gaussian naive bayes
topic neuropeptides
word2vec
one-hot
stacking strategy
convolution neural network
url https://www.frontiersin.org/articles/10.3389/fgene.2023.1226905/full
work_keys_str_mv AT diliu neurocnngnbanensemblemodeltopredictneuropeptidesbasedonaconvolutionneuralnetworkandgaussiannaivebayes
AT zhengkuilin neurocnngnbanensemblemodeltopredictneuropeptidesbasedonaconvolutionneuralnetworkandgaussiannaivebayes
AT cangzhijia neurocnngnbanensemblemodeltopredictneuropeptidesbasedonaconvolutionneuralnetworkandgaussiannaivebayes