pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties

Abstract Background Protein histidine phosphorylation (pHis) plays critical roles in prokaryotic signal transduction pathways and various eukaryotic cellular processes. It is estimated to account for 6–10% of the phosphoproteome, however only hundreds of pHis sites have been discovered to date. Due...

Full description

Bibliographic Details
Main Authors: Jian Zhao, Minhui Zhuang, Jingjing Liu, Meng Zhang, Cong Zeng, Bin Jiang, Jing Wu, Xiaofeng Song
Format: Article
Language:English
Published: BMC 2022-09-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-04938-x
_version_ 1811202874302005248
author Jian Zhao
Minhui Zhuang
Jingjing Liu
Meng Zhang
Cong Zeng
Bin Jiang
Jing Wu
Xiaofeng Song
author_facet Jian Zhao
Minhui Zhuang
Jingjing Liu
Meng Zhang
Cong Zeng
Bin Jiang
Jing Wu
Xiaofeng Song
author_sort Jian Zhao
collection DOAJ
description Abstract Background Protein histidine phosphorylation (pHis) plays critical roles in prokaryotic signal transduction pathways and various eukaryotic cellular processes. It is estimated to account for 6–10% of the phosphoproteome, however only hundreds of pHis sites have been discovered to date. Due to the inherent disadvantages of experimental methods, it is an urgent task for developing efficient computational approaches to identify pHis sites. Results Here, we present a novel tool, pHisPred, for accurately identifying pHis sites from protein sequences. We manually collected the largest number of experimental validated pHis sites to build benchmark datasets. Using randomized tenfold CV, the weighted SVM-RBF model shows the best performance than other four commonly used classification models (LR, KNN, RF, and MLP). From ten thousands of features, 140 and 150 most informative features were individually selected out for eukaryotic and prokaryotic models. The average AUC and F1-score values of pHisPred were (0.81, 0.40) and (0.78, 0.46) for tenfold CV on the eukaryotic and prokaryotic training datasets, respectively. In addition, pHisPred significantly outperforms other tools on testing datasets, in particular on the eukaryotic one. Conclusion We implemented a python program of pHisPred, which is freely available for non-commercial use at https://github.com/xiaofengsong/pHisPred . Moreover, users can use it to train new models with their own data.
first_indexed 2024-04-12T02:46:18Z
format Article
id doaj.art-8104357d2c21495d8422b41ea4c39a63
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T02:46:18Z
publishDate 2022-09-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-8104357d2c21495d8422b41ea4c39a632022-12-22T03:51:10ZengBMCBMC Bioinformatics1471-21052022-09-0123S311710.1186/s12859-022-04938-xpHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and propertiesJian Zhao0Minhui Zhuang1Jingjing Liu2Meng Zhang3Cong Zeng4Bin Jiang5Jing Wu6Xiaofeng Song7Department of Biomedical Engineering, Nanjing University of Aeronautics and AstronauticsDepartment of Biomedical Engineering, Nanjing University of Aeronautics and AstronauticsDepartment of Biomedical Engineering, Nanjing University of Aeronautics and AstronauticsDepartment of Biomedical Engineering, Nanjing University of Aeronautics and AstronauticsDepartment of Biomedical Engineering, Nanjing University of Aeronautics and AstronauticsCollege of Automation Engineering, Nanjing University of Aeronautics and AstronauticsSchool of Biomedical Engineering and Informatics, Nanjing Medical UniversityDepartment of Biomedical Engineering, Nanjing University of Aeronautics and AstronauticsAbstract Background Protein histidine phosphorylation (pHis) plays critical roles in prokaryotic signal transduction pathways and various eukaryotic cellular processes. It is estimated to account for 6–10% of the phosphoproteome, however only hundreds of pHis sites have been discovered to date. Due to the inherent disadvantages of experimental methods, it is an urgent task for developing efficient computational approaches to identify pHis sites. Results Here, we present a novel tool, pHisPred, for accurately identifying pHis sites from protein sequences. We manually collected the largest number of experimental validated pHis sites to build benchmark datasets. Using randomized tenfold CV, the weighted SVM-RBF model shows the best performance than other four commonly used classification models (LR, KNN, RF, and MLP). From ten thousands of features, 140 and 150 most informative features were individually selected out for eukaryotic and prokaryotic models. The average AUC and F1-score values of pHisPred were (0.81, 0.40) and (0.78, 0.46) for tenfold CV on the eukaryotic and prokaryotic training datasets, respectively. In addition, pHisPred significantly outperforms other tools on testing datasets, in particular on the eukaryotic one. Conclusion We implemented a python program of pHisPred, which is freely available for non-commercial use at https://github.com/xiaofengsong/pHisPred . Moreover, users can use it to train new models with their own data.https://doi.org/10.1186/s12859-022-04938-xHistidine phosphorylationPhosphohistidine siteMachine learningpHis predictionpHisPred
spellingShingle Jian Zhao
Minhui Zhuang
Jingjing Liu
Meng Zhang
Cong Zeng
Bin Jiang
Jing Wu
Xiaofeng Song
pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
BMC Bioinformatics
Histidine phosphorylation
Phosphohistidine site
Machine learning
pHis prediction
pHisPred
title pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
title_full pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
title_fullStr pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
title_full_unstemmed pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
title_short pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
title_sort phispred a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
topic Histidine phosphorylation
Phosphohistidine site
Machine learning
pHis prediction
pHisPred
url https://doi.org/10.1186/s12859-022-04938-x
work_keys_str_mv AT jianzhao phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT minhuizhuang phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT jingjingliu phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT mengzhang phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT congzeng phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT binjiang phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT jingwu phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT xiaofengsong phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties