pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model

Abstract Background Protein S-nitrosylation (SNO) plays a key role in transferring nitric oxide-mediated signals in both animals and plants and has emerged as an important mechanism for regulating protein functions and cell signaling of all main classes of protein. It is involved in several biologic...

Full description

Bibliographic Details
Main Authors: Pawel Pratyush, Suresh Pokharel, Hiroto Saigo, Dukka B. KC
Format: Article
Language:English
Published: BMC 2023-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05164-9
_version_ 1811165681658363904
author Pawel Pratyush
Suresh Pokharel
Hiroto Saigo
Dukka B. KC
author_facet Pawel Pratyush
Suresh Pokharel
Hiroto Saigo
Dukka B. KC
author_sort Pawel Pratyush
collection DOAJ
description Abstract Background Protein S-nitrosylation (SNO) plays a key role in transferring nitric oxide-mediated signals in both animals and plants and has emerged as an important mechanism for regulating protein functions and cell signaling of all main classes of protein. It is involved in several biological processes including immune response, protein stability, transcription regulation, post translational regulation, DNA damage repair, redox regulation, and is an emerging paradigm of redox signaling for protection against oxidative stress. The development of robust computational tools to predict protein SNO sites would contribute to further interpretation of the pathological and physiological mechanisms of SNO. Results Using an intermediate fusion-based stacked generalization approach, we integrated embeddings from supervised embedding layer and contextualized protein language model (ProtT5) and developed a tool called pLMSNOSite (protein language model-based SNO site predictor). On an independent test set of experimentally identified SNO sites, pLMSNOSite achieved values of 0.340, 0.735 and 0.773 for MCC, sensitivity and specificity respectively. These results show that pLMSNOSite performs better than the compared approaches for the prediction of S-nitrosylation sites. Conclusion Together, the experimental results suggest that pLMSNOSite achieves significant improvement in the prediction performance of S-nitrosylation sites and represents a robust computational approach for predicting protein S-nitrosylation sites. pLMSNOSite could be a useful resource for further elucidation of SNO and is publicly available at https://github.com/KCLabMTU/pLMSNOSite .
first_indexed 2024-04-10T15:40:40Z
format Article
id doaj.art-24358939c36148d098270c815601afab
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-10T15:40:40Z
publishDate 2023-02-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-24358939c36148d098270c815601afab2023-02-12T12:24:24ZengBMCBMC Bioinformatics1471-21052023-02-0124112010.1186/s12859-023-05164-9pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language modelPawel Pratyush0Suresh Pokharel1Hiroto Saigo2Dukka B. KC3Department of Computer Science, Michigan Technological UniversityDepartment of Computer Science, Michigan Technological UniversityDepartment of Electrical Engineering and Computer Science, Kyushu UniversityDepartment of Computer Science, Michigan Technological UniversityAbstract Background Protein S-nitrosylation (SNO) plays a key role in transferring nitric oxide-mediated signals in both animals and plants and has emerged as an important mechanism for regulating protein functions and cell signaling of all main classes of protein. It is involved in several biological processes including immune response, protein stability, transcription regulation, post translational regulation, DNA damage repair, redox regulation, and is an emerging paradigm of redox signaling for protection against oxidative stress. The development of robust computational tools to predict protein SNO sites would contribute to further interpretation of the pathological and physiological mechanisms of SNO. Results Using an intermediate fusion-based stacked generalization approach, we integrated embeddings from supervised embedding layer and contextualized protein language model (ProtT5) and developed a tool called pLMSNOSite (protein language model-based SNO site predictor). On an independent test set of experimentally identified SNO sites, pLMSNOSite achieved values of 0.340, 0.735 and 0.773 for MCC, sensitivity and specificity respectively. These results show that pLMSNOSite performs better than the compared approaches for the prediction of S-nitrosylation sites. Conclusion Together, the experimental results suggest that pLMSNOSite achieves significant improvement in the prediction performance of S-nitrosylation sites and represents a robust computational approach for predicting protein S-nitrosylation sites. pLMSNOSite could be a useful resource for further elucidation of SNO and is publicly available at https://github.com/KCLabMTU/pLMSNOSite .https://doi.org/10.1186/s12859-023-05164-9S-nitrosylationDeep learningConvolutional neural networkPost-translational modificationWord embeddingProtein language model
spellingShingle Pawel Pratyush
Suresh Pokharel
Hiroto Saigo
Dukka B. KC
pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model
BMC Bioinformatics
S-nitrosylation
Deep learning
Convolutional neural network
Post-translational modification
Word embedding
Protein language model
title pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model
title_full pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model
title_fullStr pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model
title_full_unstemmed pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model
title_short pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model
title_sort plmsnosite an ensemble based approach for predicting protein s nitrosylation sites by integrating supervised word embedding and embedding from pre trained protein language model
topic S-nitrosylation
Deep learning
Convolutional neural network
Post-translational modification
Word embedding
Protein language model
url https://doi.org/10.1186/s12859-023-05164-9
work_keys_str_mv AT pawelpratyush plmsnositeanensemblebasedapproachforpredictingproteinsnitrosylationsitesbyintegratingsupervisedwordembeddingandembeddingfrompretrainedproteinlanguagemodel
AT sureshpokharel plmsnositeanensemblebasedapproachforpredictingproteinsnitrosylationsitesbyintegratingsupervisedwordembeddingandembeddingfrompretrainedproteinlanguagemodel
AT hirotosaigo plmsnositeanensemblebasedapproachforpredictingproteinsnitrosylationsitesbyintegratingsupervisedwordembeddingandembeddingfrompretrainedproteinlanguagemodel
AT dukkabkc plmsnositeanensemblebasedapproachforpredictingproteinsnitrosylationsitesbyintegratingsupervisedwordembeddingandembeddingfrompretrainedproteinlanguagemodel