Mind your prevalence!

Abstract Multiple metrics are used when assessing and validating the performance of quantitative structure–activity relationship (QSAR) models. In the case of binary classification, balanced accuracy is a metric to assess the global performance of such models. In contrast to accuracy, balanced accur...

Full description

Bibliographic Details
Main Authors:	Sébastien J. J. Guesné, Thierry Hanser, Stéphane Werner, Samuel Boobier, Shaylyn Scott
Format:	Article
Language:	English
Published:	BMC 2024-04-01
Series:	Journal of Cheminformatics
Subjects:	Prevalence Prevalence shift Imbalanced Balanced Matthews’ correlation coefficient Calibrated metrics Balanced metrics
Online Access:	https://doi.org/10.1186/s13321-024-00837-w

_version_	1797199218768609280
author	Sébastien J. J. Guesné Thierry Hanser Stéphane Werner Samuel Boobier Shaylyn Scott
author_facet	Sébastien J. J. Guesné Thierry Hanser Stéphane Werner Samuel Boobier Shaylyn Scott
author_sort	Sébastien J. J. Guesné
collection	DOAJ
description	Abstract Multiple metrics are used when assessing and validating the performance of quantitative structure–activity relationship (QSAR) models. In the case of binary classification, balanced accuracy is a metric to assess the global performance of such models. In contrast to accuracy, balanced accuracy does not depend on the respective prevalence of the two categories in the test set that is used to validate a QSAR classifier. As such, balanced accuracy is used to overcome the effect of imbalanced test sets on the model’s perceived accuracy. Matthews' correlation coefficient (MCC), an alternative global performance metric, is also known to mitigate the imbalance of the test set. However, in contrast to the balanced accuracy, MCC remains dependent on the respective prevalence of the predicted categories. For simplicity, the rest of this work is based on the positive prevalence. The MCC value may be underestimated at high or extremely low positive prevalence. It contributes to more challenging comparisons between experiments using test sets with different positive prevalences and may lead to incorrect interpretations. The concept of balanced metrics beyond balanced accuracy is, to the best of our knowledge, not yet described in the cheminformatic literature. Therefore, after describing the relevant literature, this manuscript will first formally define a confusion matrix, sensitivity and specificity and then present, with synthetic data, the danger of comparing performance metrics under nonconstant prevalence. Second, it will demonstrate that balanced accuracy is the performance metric accuracy calibrated to a test set with a positive prevalence of 50% (i.e., balanced test set). This concept of balanced accuracy will then be extended to the MCC after showing its dependency on the positive prevalence. Applying the same concept to any other performance metric and widening it to the concept of calibrated metrics will then be briefly discussed. We will show that, like balanced accuracy, any balanced performance metric may be expressed as a function of the well-known values of sensitivity and specificity. Finally, a tale of two MCCs will exemplify the use of this concept of balanced MCC versus MCC with four use cases using synthetic data. Scientific contribution This work provides a formal, unified framework for understanding prevalence dependence in model validation metrics, deriving balanced metric expressions beyond balanced accuracy, and demonstrating their practical utility for common use cases. In contrast to prior literature, it introduces the derived confusion matrix to express metrics as functions of sensitivity, specificity and prevalence without needing additional coefficients. The manuscript extends the concept of balanced metrics to Matthews' correlation coefficient and other widely used performance indicators, enabling robust comparisons under prevalence shifts.
first_indexed	2024-04-24T07:12:16Z
format	Article
id	doaj.art-d0673062d6b34d8ba9dfa37669f04284
institution	Directory Open Access Journal
issn	1758-2946
language	English
last_indexed	2024-04-24T07:12:16Z
publishDate	2024-04-01
publisher	BMC
record_format	Article
series	Journal of Cheminformatics
spelling	doaj.art-d0673062d6b34d8ba9dfa37669f042842024-04-21T11:28:21ZengBMCJournal of Cheminformatics1758-29462024-04-0116111310.1186/s13321-024-00837-wMind your prevalence!Sébastien J. J. Guesné0Thierry Hanser1Stéphane Werner2Samuel Boobier3Shaylyn Scott4Lhasa LimitedLhasa LimitedLhasa LimitedLhasa LimitedLhasa LimitedAbstract Multiple metrics are used when assessing and validating the performance of quantitative structure–activity relationship (QSAR) models. In the case of binary classification, balanced accuracy is a metric to assess the global performance of such models. In contrast to accuracy, balanced accuracy does not depend on the respective prevalence of the two categories in the test set that is used to validate a QSAR classifier. As such, balanced accuracy is used to overcome the effect of imbalanced test sets on the model’s perceived accuracy. Matthews' correlation coefficient (MCC), an alternative global performance metric, is also known to mitigate the imbalance of the test set. However, in contrast to the balanced accuracy, MCC remains dependent on the respective prevalence of the predicted categories. For simplicity, the rest of this work is based on the positive prevalence. The MCC value may be underestimated at high or extremely low positive prevalence. It contributes to more challenging comparisons between experiments using test sets with different positive prevalences and may lead to incorrect interpretations. The concept of balanced metrics beyond balanced accuracy is, to the best of our knowledge, not yet described in the cheminformatic literature. Therefore, after describing the relevant literature, this manuscript will first formally define a confusion matrix, sensitivity and specificity and then present, with synthetic data, the danger of comparing performance metrics under nonconstant prevalence. Second, it will demonstrate that balanced accuracy is the performance metric accuracy calibrated to a test set with a positive prevalence of 50% (i.e., balanced test set). This concept of balanced accuracy will then be extended to the MCC after showing its dependency on the positive prevalence. Applying the same concept to any other performance metric and widening it to the concept of calibrated metrics will then be briefly discussed. We will show that, like balanced accuracy, any balanced performance metric may be expressed as a function of the well-known values of sensitivity and specificity. Finally, a tale of two MCCs will exemplify the use of this concept of balanced MCC versus MCC with four use cases using synthetic data. Scientific contribution This work provides a formal, unified framework for understanding prevalence dependence in model validation metrics, deriving balanced metric expressions beyond balanced accuracy, and demonstrating their practical utility for common use cases. In contrast to prior literature, it introduces the derived confusion matrix to express metrics as functions of sensitivity, specificity and prevalence without needing additional coefficients. The manuscript extends the concept of balanced metrics to Matthews' correlation coefficient and other widely used performance indicators, enabling robust comparisons under prevalence shifts.https://doi.org/10.1186/s13321-024-00837-wPrevalencePrevalence shiftImbalancedBalanced Matthews’ correlation coefficientCalibrated metricsBalanced metrics
spellingShingle	Sébastien J. J. Guesné Thierry Hanser Stéphane Werner Samuel Boobier Shaylyn Scott Mind your prevalence! Journal of Cheminformatics Prevalence Prevalence shift Imbalanced Balanced Matthews’ correlation coefficient Calibrated metrics Balanced metrics
title	Mind your prevalence!
title_full	Mind your prevalence!
title_fullStr	Mind your prevalence!
title_full_unstemmed	Mind your prevalence!
title_short	Mind your prevalence!
title_sort	mind your prevalence
topic	Prevalence Prevalence shift Imbalanced Balanced Matthews’ correlation coefficient Calibrated metrics Balanced metrics
url	https://doi.org/10.1186/s13321-024-00837-w
work_keys_str_mv	AT sebastienjjguesne mindyourprevalence AT thierryhanser mindyourprevalence AT stephanewerner mindyourprevalence AT samuelboobier mindyourprevalence AT shaylynscott mindyourprevalence

Mind your prevalence!

Similar Items