Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures

Uncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model’s uncertainty is evaluated using point-prediction metrics, such as the negative log-likelihood (NLL), expec...

Full description

Bibliographic Details
Main Authors: Benjamin Kompa, Jasper Snoek, Andrew L. Beam
Format: Article
Language:English
Published: MDPI AG 2021-11-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/23/12/1608
_version_ 1797504928727433216
author Benjamin Kompa
Jasper Snoek
Andrew L. Beam
author_facet Benjamin Kompa
Jasper Snoek
Andrew L. Beam
author_sort Benjamin Kompa
collection DOAJ
description Uncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model’s uncertainty is evaluated using point-prediction metrics, such as the negative log-likelihood (NLL), expected calibration error (ECE) or the Brier score on held-out data. Marginal coverage of prediction intervals or sets, a well-known concept in the statistical literature, is an intuitive alternative to these metrics but has yet to be systematically studied for many popular uncertainty quantification techniques for deep learning models. With marginal coverage and the complementary notion of the width of a prediction interval, downstream users of deployed machine learning models can better understand uncertainty quantification both on a global dataset level and on a per-sample basis. In this study, we provide the first large-scale evaluation of the empirical frequentist coverage properties of well-known uncertainty quantification techniques on a suite of regression and classification tasks. We find that, in general, some methods do achieve desirable coverage properties on <i>in distribution</i> samples, but that coverage is not maintained on out-of-distribution data. Our results demonstrate the failings of current uncertainty quantification techniques as dataset shift increases and reinforce coverage as an important metric in developing models for real-world applications.
first_indexed 2024-03-10T04:11:23Z
format Article
id doaj.art-a5f6ccc48d2c4dfc978a92e575f1051c
institution Directory Open Access Journal
issn 1099-4300
language English
last_indexed 2024-03-10T04:11:23Z
publishDate 2021-11-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj.art-a5f6ccc48d2c4dfc978a92e575f1051c2023-11-23T08:10:34ZengMDPI AGEntropy1099-43002021-11-012312160810.3390/e23121608Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification ProceduresBenjamin Kompa0Jasper Snoek1Andrew L. Beam2Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USAGoogle Research, Cambridge, MA 02142, USADepartment of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USAUncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model’s uncertainty is evaluated using point-prediction metrics, such as the negative log-likelihood (NLL), expected calibration error (ECE) or the Brier score on held-out data. Marginal coverage of prediction intervals or sets, a well-known concept in the statistical literature, is an intuitive alternative to these metrics but has yet to be systematically studied for many popular uncertainty quantification techniques for deep learning models. With marginal coverage and the complementary notion of the width of a prediction interval, downstream users of deployed machine learning models can better understand uncertainty quantification both on a global dataset level and on a per-sample basis. In this study, we provide the first large-scale evaluation of the empirical frequentist coverage properties of well-known uncertainty quantification techniques on a suite of regression and classification tasks. We find that, in general, some methods do achieve desirable coverage properties on <i>in distribution</i> samples, but that coverage is not maintained on out-of-distribution data. Our results demonstrate the failings of current uncertainty quantification techniques as dataset shift increases and reinforce coverage as an important metric in developing models for real-world applications.https://www.mdpi.com/1099-4300/23/12/1608uncertainty quantificationcoverageBayesian methodsdataset shift
spellingShingle Benjamin Kompa
Jasper Snoek
Andrew L. Beam
Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
Entropy
uncertainty quantification
coverage
Bayesian methods
dataset shift
title Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
title_full Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
title_fullStr Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
title_full_unstemmed Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
title_short Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
title_sort empirical frequentist coverage of deep learning uncertainty quantification procedures
topic uncertainty quantification
coverage
Bayesian methods
dataset shift
url https://www.mdpi.com/1099-4300/23/12/1608
work_keys_str_mv AT benjaminkompa empiricalfrequentistcoverageofdeeplearninguncertaintyquantificationprocedures
AT jaspersnoek empiricalfrequentistcoverageofdeeplearninguncertaintyquantificationprocedures
AT andrewlbeam empiricalfrequentistcoverageofdeeplearninguncertaintyquantificationprocedures