The role of data imbalance bias in the prediction of protein stability change upon mutation
There is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias c...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2023-01-01
|
Series: | PLoS ONE |
Online Access: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062539/?tool=EBI |
_version_ | 1797852136766177280 |
---|---|
author | Jianwen Fang |
author_facet | Jianwen Fang |
author_sort | Jianwen Fang |
collection | DOAJ |
description | There is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias caused by data imbalance as there are more destabilizing mutations than stabilizing ones. In this study, a simple approach was developed to construct a balanced dataset that was then conjugated with a leave-one-protein-out approach to illustrate that the bias may not be the primary reason for poor performance. A balanced dataset with some seemly good conventional n-fold CV results should not be used as a proof that a model for predicting protein stability change upon mutations is robust. Thus, some of the existing algorithms need to be re-examined before any practical applications. Also, more emphasis should be put on obtaining high quality and quantity of data and features in future research. |
first_indexed | 2024-04-09T19:28:07Z |
format | Article |
id | doaj.art-bf2e6cb18bb64e319ce49a0af3465c04 |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-04-09T19:28:07Z |
publishDate | 2023-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-bf2e6cb18bb64e319ce49a0af3465c042023-04-05T05:32:00ZengPublic Library of Science (PLoS)PLoS ONE1932-62032023-01-01183The role of data imbalance bias in the prediction of protein stability change upon mutationJianwen FangThere is a controversy over what causes the low robustness of some programs for predicting protein stability change upon mutation. Some researchers suggested that low-quality data and insufficiently informative features are the primary reasons, while others attributed the problem largely to a bias caused by data imbalance as there are more destabilizing mutations than stabilizing ones. In this study, a simple approach was developed to construct a balanced dataset that was then conjugated with a leave-one-protein-out approach to illustrate that the bias may not be the primary reason for poor performance. A balanced dataset with some seemly good conventional n-fold CV results should not be used as a proof that a model for predicting protein stability change upon mutations is robust. Thus, some of the existing algorithms need to be re-examined before any practical applications. Also, more emphasis should be put on obtaining high quality and quantity of data and features in future research.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062539/?tool=EBI |
spellingShingle | Jianwen Fang The role of data imbalance bias in the prediction of protein stability change upon mutation PLoS ONE |
title | The role of data imbalance bias in the prediction of protein stability change upon mutation |
title_full | The role of data imbalance bias in the prediction of protein stability change upon mutation |
title_fullStr | The role of data imbalance bias in the prediction of protein stability change upon mutation |
title_full_unstemmed | The role of data imbalance bias in the prediction of protein stability change upon mutation |
title_short | The role of data imbalance bias in the prediction of protein stability change upon mutation |
title_sort | role of data imbalance bias in the prediction of protein stability change upon mutation |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10062539/?tool=EBI |
work_keys_str_mv | AT jianwenfang theroleofdataimbalancebiasinthepredictionofproteinstabilitychangeuponmutation AT jianwenfang roleofdataimbalancebiasinthepredictionofproteinstabilitychangeuponmutation |