Statistical modeling to quantify the uncertainty of FoldX-predicted protein folding and binding stability

Abstract Background Computational methods of predicting protein stability changes upon missense mutations are invaluable tools in high-throughput studies involving a large number of protein variants. However, they are limited by a wide variation in accuracy and difficulty of assessing prediction unc...

Full description

Bibliographic Details
Main Authors: Yesol Sapozhnikov, Jagdish Suresh Patel, F. Marty Ytreberg, Craig R. Miller
Format: Article
Language:English
Published: BMC 2023-11-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05537-0
_version_ 1827707929840582656
author Yesol Sapozhnikov
Jagdish Suresh Patel
F. Marty Ytreberg
Craig R. Miller
author_facet Yesol Sapozhnikov
Jagdish Suresh Patel
F. Marty Ytreberg
Craig R. Miller
author_sort Yesol Sapozhnikov
collection DOAJ
description Abstract Background Computational methods of predicting protein stability changes upon missense mutations are invaluable tools in high-throughput studies involving a large number of protein variants. However, they are limited by a wide variation in accuracy and difficulty of assessing prediction uncertainty. Using a popular computational tool, FoldX, we develop a statistical framework that quantifies the uncertainty of predicted changes in protein stability. Results We show that multiple linear regression models can be used to quantify the uncertainty associated with FoldX prediction for individual mutations. Comparing the performance among models with varying degrees of complexity, we find that the model precision improves significantly when we utilize molecular dynamics simulation as part of the FoldX workflow. Based on the model that incorporates information from molecular dynamics, biochemical properties, as well as FoldX energy terms, we can generally expect upper bounds on the uncertainty of folding stability predictions of ± 2.9 kcal/mol and ± 3.5 kcal/mol for binding stability predictions. The uncertainty for individual mutations varies; our model estimates it using FoldX energy terms, biochemical properties of the mutated residue, as well as the variability among snapshots from molecular dynamics simulation. Conclusions Using a linear regression framework, we construct models to predict the uncertainty associated with FoldX prediction of stability changes upon mutation. This technique is straightforward and can be extended to other computational methods as well.
first_indexed 2024-03-10T16:56:32Z
format Article
id doaj.art-114f10095816436fb96fe61190b50749
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-03-10T16:56:32Z
publishDate 2023-11-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-114f10095816436fb96fe61190b507492023-11-20T11:06:27ZengBMCBMC Bioinformatics1471-21052023-11-0124111810.1186/s12859-023-05537-0Statistical modeling to quantify the uncertainty of FoldX-predicted protein folding and binding stabilityYesol Sapozhnikov0Jagdish Suresh Patel1F. Marty Ytreberg2Craig R. Miller3Program in Bioinformatics and Computational Biology, University of IdahoDepartment of Chemical and Biological Engineering, University of IdahoDepartment of Physics, University of IdahoDepartment of Biological Sciences, University of IdahoAbstract Background Computational methods of predicting protein stability changes upon missense mutations are invaluable tools in high-throughput studies involving a large number of protein variants. However, they are limited by a wide variation in accuracy and difficulty of assessing prediction uncertainty. Using a popular computational tool, FoldX, we develop a statistical framework that quantifies the uncertainty of predicted changes in protein stability. Results We show that multiple linear regression models can be used to quantify the uncertainty associated with FoldX prediction for individual mutations. Comparing the performance among models with varying degrees of complexity, we find that the model precision improves significantly when we utilize molecular dynamics simulation as part of the FoldX workflow. Based on the model that incorporates information from molecular dynamics, biochemical properties, as well as FoldX energy terms, we can generally expect upper bounds on the uncertainty of folding stability predictions of ± 2.9 kcal/mol and ± 3.5 kcal/mol for binding stability predictions. The uncertainty for individual mutations varies; our model estimates it using FoldX energy terms, biochemical properties of the mutated residue, as well as the variability among snapshots from molecular dynamics simulation. Conclusions Using a linear regression framework, we construct models to predict the uncertainty associated with FoldX prediction of stability changes upon mutation. This technique is straightforward and can be extended to other computational methods as well.https://doi.org/10.1186/s12859-023-05537-0Protein stabilityProtein mutationsStability predictionError predictionStatistical model
spellingShingle Yesol Sapozhnikov
Jagdish Suresh Patel
F. Marty Ytreberg
Craig R. Miller
Statistical modeling to quantify the uncertainty of FoldX-predicted protein folding and binding stability
BMC Bioinformatics
Protein stability
Protein mutations
Stability prediction
Error prediction
Statistical model
title Statistical modeling to quantify the uncertainty of FoldX-predicted protein folding and binding stability
title_full Statistical modeling to quantify the uncertainty of FoldX-predicted protein folding and binding stability
title_fullStr Statistical modeling to quantify the uncertainty of FoldX-predicted protein folding and binding stability
title_full_unstemmed Statistical modeling to quantify the uncertainty of FoldX-predicted protein folding and binding stability
title_short Statistical modeling to quantify the uncertainty of FoldX-predicted protein folding and binding stability
title_sort statistical modeling to quantify the uncertainty of foldx predicted protein folding and binding stability
topic Protein stability
Protein mutations
Stability prediction
Error prediction
Statistical model
url https://doi.org/10.1186/s12859-023-05537-0
work_keys_str_mv AT yesolsapozhnikov statisticalmodelingtoquantifytheuncertaintyoffoldxpredictedproteinfoldingandbindingstability
AT jagdishsureshpatel statisticalmodelingtoquantifytheuncertaintyoffoldxpredictedproteinfoldingandbindingstability
AT fmartyytreberg statisticalmodelingtoquantifytheuncertaintyoffoldxpredictedproteinfoldingandbindingstability
AT craigrmiller statisticalmodelingtoquantifytheuncertaintyoffoldxpredictedproteinfoldingandbindingstability