Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures

Uncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model’s uncertainty is evaluated using point-prediction metrics, such as the negative log-likelihood (NLL), expec...

Full description

Bibliographic Details
Main Authors:	Benjamin Kompa, Jasper Snoek, Andrew L. Beam
Format:	Article
Language:	English
Published:	MDPI AG 2021-11-01
Series:	Entropy
Subjects:	uncertainty quantification coverage Bayesian methods dataset shift
Online Access:	https://www.mdpi.com/1099-4300/23/12/1608

_version_	1797504928727433216
author	Benjamin Kompa Jasper Snoek Andrew L. Beam
author_facet	Benjamin Kompa Jasper Snoek Andrew L. Beam
author_sort	Benjamin Kompa
collection	DOAJ
description	Uncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model’s uncertainty is evaluated using point-prediction metrics, such as the negative log-likelihood (NLL), expected calibration error (ECE) or the Brier score on held-out data. Marginal coverage of prediction intervals or sets, a well-known concept in the statistical literature, is an intuitive alternative to these metrics but has yet to be systematically studied for many popular uncertainty quantification techniques for deep learning models. With marginal coverage and the complementary notion of the width of a prediction interval, downstream users of deployed machine learning models can better understand uncertainty quantification both on a global dataset level and on a per-sample basis. In this study, we provide the first large-scale evaluation of the empirical frequentist coverage properties of well-known uncertainty quantification techniques on a suite of regression and classification tasks. We find that, in general, some methods do achieve desirable coverage properties on <i>in distribution</i> samples, but that coverage is not maintained on out-of-distribution data. Our results demonstrate the failings of current uncertainty quantification techniques as dataset shift increases and reinforce coverage as an important metric in developing models for real-world applications.
first_indexed	2024-03-10T04:11:23Z
format	Article
id	doaj.art-a5f6ccc48d2c4dfc978a92e575f1051c
institution	Directory Open Access Journal
issn	1099-4300
language	English
last_indexed	2024-03-10T04:11:23Z
publishDate	2021-11-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj.art-a5f6ccc48d2c4dfc978a92e575f1051c2023-11-23T08:10:34ZengMDPI AGEntropy1099-43002021-11-012312160810.3390/e23121608Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification ProceduresBenjamin Kompa0Jasper Snoek1Andrew L. Beam2Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USAGoogle Research, Cambridge, MA 02142, USADepartment of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USAUncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model’s uncertainty is evaluated using point-prediction metrics, such as the negative log-likelihood (NLL), expected calibration error (ECE) or the Brier score on held-out data. Marginal coverage of prediction intervals or sets, a well-known concept in the statistical literature, is an intuitive alternative to these metrics but has yet to be systematically studied for many popular uncertainty quantification techniques for deep learning models. With marginal coverage and the complementary notion of the width of a prediction interval, downstream users of deployed machine learning models can better understand uncertainty quantification both on a global dataset level and on a per-sample basis. In this study, we provide the first large-scale evaluation of the empirical frequentist coverage properties of well-known uncertainty quantification techniques on a suite of regression and classification tasks. We find that, in general, some methods do achieve desirable coverage properties on <i>in distribution</i> samples, but that coverage is not maintained on out-of-distribution data. Our results demonstrate the failings of current uncertainty quantification techniques as dataset shift increases and reinforce coverage as an important metric in developing models for real-world applications.https://www.mdpi.com/1099-4300/23/12/1608uncertainty quantificationcoverageBayesian methodsdataset shift
spellingShingle	Benjamin Kompa Jasper Snoek Andrew L. Beam Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures Entropy uncertainty quantification coverage Bayesian methods dataset shift
title	Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
title_full	Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
title_fullStr	Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
title_full_unstemmed	Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
title_short	Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
title_sort	empirical frequentist coverage of deep learning uncertainty quantification procedures
topic	uncertainty quantification coverage Bayesian methods dataset shift
url	https://www.mdpi.com/1099-4300/23/12/1608
work_keys_str_mv	AT benjaminkompa empiricalfrequentistcoverageofdeeplearninguncertaintyquantificationprocedures AT jaspersnoek empiricalfrequentistcoverageofdeeplearninguncertaintyquantificationprocedures AT andrewlbeam empiricalfrequentistcoverageofdeeplearninguncertaintyquantificationprocedures

Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures

Similar Items