ProTstab – predictor for cellular protein stability

Abstract Background Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majori...

Full description

Bibliographic Details
Main Authors: Yang Yang, Xuesong Ding, Guanchen Zhu, Abhishek Niroula, Qiang Lv, Mauno Vihinen
Format: Article
Language:English
Published: BMC 2019-11-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-019-6138-7
_version_ 1831735032447762432
author Yang Yang
Xuesong Ding
Guanchen Zhu
Abhishek Niroula
Qiang Lv
Mauno Vihinen
author_facet Yang Yang
Xuesong Ding
Guanchen Zhu
Abhishek Niroula
Qiang Lv
Mauno Vihinen
author_sort Yang Yang
collection DOAJ
description Abstract Background Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majority of identified proteins. There have been some attempts to develop predictors for protein stabilities; however, they have suffered from small numbers of known examples. Results We took benefit of results from a recently developed cellular stability method, which is based on limited proteolysis and mass spectrometry, and developed a machine learning method using gradient boosting of regression trees. ProTstab method has high performance and is well suited for large scale prediction of protein stabilities. Conclusions The Pearson’s correlation coefficient was 0.793 in 10-fold cross validation and 0.763 in independent blind test. The corresponding values for mean absolute error are 0.024 and 0.036, respectively. Comparison with a previously published method indicated ProTstab to have superior performance. We used the method to predict stabilities of all the remaining proteins in the entire human proteome and then correlated the predicted stabilities to protein chain lengths of isoforms and to localizations of proteins.
first_indexed 2024-12-21T12:18:59Z
format Article
id doaj.art-08bdaf0b77f144ccb3b61f5177ca011e
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-21T12:18:59Z
publishDate 2019-11-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-08bdaf0b77f144ccb3b61f5177ca011e2022-12-21T19:04:21ZengBMCBMC Genomics1471-21642019-11-012011910.1186/s12864-019-6138-7ProTstab – predictor for cellular protein stabilityYang Yang0Xuesong Ding1Guanchen Zhu2Abhishek Niroula3Qiang Lv4Mauno Vihinen5School of Computer Science and Technology, Soochow UniversitySchool of Computer Science and Technology, Soochow UniversitySchool of Computer Science and Technology, Soochow UniversityDepartment of Experimental Medical Science, BMC B13, Lund UniversitySchool of Computer Science and Technology, Soochow UniversityDepartment of Experimental Medical Science, BMC B13, Lund UniversityAbstract Background Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majority of identified proteins. There have been some attempts to develop predictors for protein stabilities; however, they have suffered from small numbers of known examples. Results We took benefit of results from a recently developed cellular stability method, which is based on limited proteolysis and mass spectrometry, and developed a machine learning method using gradient boosting of regression trees. ProTstab method has high performance and is well suited for large scale prediction of protein stabilities. Conclusions The Pearson’s correlation coefficient was 0.793 in 10-fold cross validation and 0.763 in independent blind test. The corresponding values for mean absolute error are 0.024 and 0.036, respectively. Comparison with a previously published method indicated ProTstab to have superior performance. We used the method to predict stabilities of all the remaining proteins in the entire human proteome and then correlated the predicted stabilities to protein chain lengths of isoforms and to localizations of proteins.http://link.springer.com/article/10.1186/s12864-019-6138-7Protein stabilityPredictionMachine learningProteome properties
spellingShingle Yang Yang
Xuesong Ding
Guanchen Zhu
Abhishek Niroula
Qiang Lv
Mauno Vihinen
ProTstab – predictor for cellular protein stability
BMC Genomics
Protein stability
Prediction
Machine learning
Proteome properties
title ProTstab – predictor for cellular protein stability
title_full ProTstab – predictor for cellular protein stability
title_fullStr ProTstab – predictor for cellular protein stability
title_full_unstemmed ProTstab – predictor for cellular protein stability
title_short ProTstab – predictor for cellular protein stability
title_sort protstab predictor for cellular protein stability
topic Protein stability
Prediction
Machine learning
Proteome properties
url http://link.springer.com/article/10.1186/s12864-019-6138-7
work_keys_str_mv AT yangyang protstabpredictorforcellularproteinstability
AT xuesongding protstabpredictorforcellularproteinstability
AT guanchenzhu protstabpredictorforcellularproteinstability
AT abhishekniroula protstabpredictorforcellularproteinstability
AT qianglv protstabpredictorforcellularproteinstability
AT maunovihinen protstabpredictorforcellularproteinstability