Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets

With the increasingly more important role of machine learning (ML) models in chemical research, the need for putting a level of confidence to the model predictions naturally arises. Several methods for obtaining uncertainty estimates have been proposed in recent years but consensus on the evaluation...

Full description

Bibliographic Details
Main Authors:	Rasmussen, Maria H., Duan, Chenru, Kulik, Heather J., Jensen, Jan H.
Other Authors:	Massachusetts Institute of Technology. Department of Chemistry
Format:	Article
Language:	English
Published:	Springer International Publishing 2024
Online Access:	https://hdl.handle.net/1721.1/153303

_version_	1826191214574043136
author	Rasmussen, Maria H. Duan, Chenru Kulik, Heather J. Jensen, Jan H.
author2	Massachusetts Institute of Technology. Department of Chemistry
author_facet	Massachusetts Institute of Technology. Department of Chemistry Rasmussen, Maria H. Duan, Chenru Kulik, Heather J. Jensen, Jan H.
author_sort	Rasmussen, Maria H.
collection	MIT
description	With the increasingly more important role of machine learning (ML) models in chemical research, the need for putting a level of confidence to the model predictions naturally arises. Several methods for obtaining uncertainty estimates have been proposed in recent years but consensus on the evaluation of these have yet to be established and different studies on uncertainties generally uses different metrics to evaluate them. We compare three of the most popular validation metrics (Spearman’s rank correlation coefficient, the negative log likelihood (NLL) and the miscalibration area) to the error-based calibration introduced by Levi et al. (Sensors 2022, 22, 5540). Importantly, metrics such as the negative log likelihood (NLL) and Spearman’s rank correlation coefficient bear little information in themselves. We therefore introduce reference values obtained through errors simulated directly from the uncertainty distribution. The different metrics target different properties and we show how to interpret them, but we generally find the best overall validation to be done based on the error-based calibration plot introduced by Levi et al. Finally, we illustrate the sensitivity of ranking-based methods (e.g. Spearman’s rank correlation coefficient) towards test set design by using the same toy model ferent test sets and obtaining vastly different metrics (0.05 vs. 0.65).
first_indexed	2024-09-23T08:52:44Z
format	Article
id	mit-1721.1/153303
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T08:52:44Z
publishDate	2024
publisher	Springer International Publishing
record_format	dspace
spelling	mit-1721.1/1533032024-06-28T15:06:41Z Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets Rasmussen, Maria H. Duan, Chenru Kulik, Heather J. Jensen, Jan H. Massachusetts Institute of Technology. Department of Chemistry Massachusetts Institute of Technology. Department of Chemical Engineering With the increasingly more important role of machine learning (ML) models in chemical research, the need for putting a level of confidence to the model predictions naturally arises. Several methods for obtaining uncertainty estimates have been proposed in recent years but consensus on the evaluation of these have yet to be established and different studies on uncertainties generally uses different metrics to evaluate them. We compare three of the most popular validation metrics (Spearman’s rank correlation coefficient, the negative log likelihood (NLL) and the miscalibration area) to the error-based calibration introduced by Levi et al. (Sensors 2022, 22, 5540). Importantly, metrics such as the negative log likelihood (NLL) and Spearman’s rank correlation coefficient bear little information in themselves. We therefore introduce reference values obtained through errors simulated directly from the uncertainty distribution. The different metrics target different properties and we show how to interpret them, but we generally find the best overall validation to be done based on the error-based calibration plot introduced by Levi et al. Finally, we illustrate the sensitivity of ranking-based methods (e.g. Spearman’s rank correlation coefficient) towards test set design by using the same toy model ferent test sets and obtaining vastly different metrics (0.05 vs. 0.65). 2024-01-10T21:07:54Z 2024-01-10T21:07:54Z 2023-12-18 2023-12-24T04:17:48Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/153303 Journal of Cheminformatics. 2023 Dec 18;15(1):121 PUBLISHER_CC en https://doi.org/10.1186/s13321-023-00790-0 Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The Author(s) application/pdf Springer International Publishing Springer International Publishing
spellingShingle	Rasmussen, Maria H. Duan, Chenru Kulik, Heather J. Jensen, Jan H. Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets
title	Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets
title_full	Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets
title_fullStr	Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets
title_full_unstemmed	Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets
title_short	Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets
title_sort	uncertain of uncertainties a comparison of uncertainty quantification metrics for chemical data sets
url	https://hdl.handle.net/1721.1/153303
work_keys_str_mv	AT rasmussenmariah uncertainofuncertaintiesacomparisonofuncertaintyquantificationmetricsforchemicaldatasets AT duanchenru uncertainofuncertaintiesacomparisonofuncertaintyquantificationmetricsforchemicaldatasets AT kulikheatherj uncertainofuncertaintiesacomparisonofuncertaintyquantificationmetricsforchemicaldatasets AT jensenjanh uncertainofuncertaintiesacomparisonofuncertaintyquantificationmetricsforchemicaldatasets

Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets

Similar Items