The role of data imbalance bias in the prediction of protein stability change upon mutation

There is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias c...

Full description

Bibliographic Details
Main Author: Jianwen Fang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2023-01-01
Series:PLoS ONE
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062539/?tool=EBI
_version_ 1797852136766177280
author Jianwen Fang
author_facet Jianwen Fang
author_sort Jianwen Fang
collection DOAJ
description There is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias caused by data imbalance as there are more destabilizing mutations than stabilizing ones. In this study, a simple approach was developed to construct a balanced dataset that was then conjugated with a leave-one-protein-out approach to illustrate that the bias may not be the primary reason for poor performance. A balanced dataset with some seemly good conventional n-fold CV results should not be used as a proof that a model for predicting protein stability change upon mutations is robust. Thus, some of the existing algorithms need to be re-examined before any practical applications. Also, more emphasis should be put on obtaining high quality and quantity of data and features in future research.
first_indexed 2024-04-09T19:28:07Z
format Article
id doaj.art-bf2e6cb18bb64e319ce49a0af3465c04
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-09T19:28:07Z
publishDate 2023-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-bf2e6cb18bb64e319ce49a0af3465c042023-04-05T05:32:00ZengPublic Library of Science (PLoS)PLoS ONE1932-62032023-01-01183The role of data imbalance bias in the prediction of protein stability change upon mutationJianwen FangThere is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias caused by data imbalance as there are more destabilizing mutations than stabilizing ones. In this study, a simple approach was developed to construct a balanced dataset that was then conjugated with a leave-one-protein-out approach to illustrate that the bias may not be the primary reason for poor performance. A balanced dataset with some seemly good conventional n-fold CV results should not be used as a proof that a model for predicting protein stability change upon mutations is robust. Thus, some of the existing algorithms need to be re-examined before any practical applications. Also, more emphasis should be put on obtaining high quality and quantity of data and features in future research.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062539/?tool=EBI
spellingShingle Jianwen Fang
The role of data imbalance bias in the prediction of protein stability change upon mutation
PLoS ONE
title The role of data imbalance bias in the prediction of protein stability change upon mutation
title_full The role of data imbalance bias in the prediction of protein stability change upon mutation
title_fullStr The role of data imbalance bias in the prediction of protein stability change upon mutation
title_full_unstemmed The role of data imbalance bias in the prediction of protein stability change upon mutation
title_short The role of data imbalance bias in the prediction of protein stability change upon mutation
title_sort role of data imbalance bias in the prediction of protein stability change upon mutation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062539/?tool=EBI
work_keys_str_mv AT jianwenfang theroleofdataimbalancebiasinthepredictionofproteinstabilitychangeuponmutation
AT jianwenfang roleofdataimbalancebiasinthepredictionofproteinstabilitychangeuponmutation