Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
Uncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model’s uncertainty is evaluated using point-prediction metrics, such as the negative log-likelihood (NLL), expec...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-11-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/23/12/1608 |
_version_ | 1797504928727433216 |
---|---|
author | Benjamin Kompa Jasper Snoek Andrew L. Beam |
author_facet | Benjamin Kompa Jasper Snoek Andrew L. Beam |
author_sort | Benjamin Kompa |
collection | DOAJ |
description | Uncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model’s uncertainty is evaluated using point-prediction metrics, such as the negative log-likelihood (NLL), expected calibration error (ECE) or the Brier score on held-out data. Marginal coverage of prediction intervals or sets, a well-known concept in the statistical literature, is an intuitive alternative to these metrics but has yet to be systematically studied for many popular uncertainty quantification techniques for deep learning models. With marginal coverage and the complementary notion of the width of a prediction interval, downstream users of deployed machine learning models can better understand uncertainty quantification both on a global dataset level and on a per-sample basis. In this study, we provide the first large-scale evaluation of the empirical frequentist coverage properties of well-known uncertainty quantification techniques on a suite of regression and classification tasks. We find that, in general, some methods do achieve desirable coverage properties on <i>in distribution</i> samples, but that coverage is not maintained on out-of-distribution data. Our results demonstrate the failings of current uncertainty quantification techniques as dataset shift increases and reinforce coverage as an important metric in developing models for real-world applications. |
first_indexed | 2024-03-10T04:11:23Z |
format | Article |
id | doaj.art-a5f6ccc48d2c4dfc978a92e575f1051c |
institution | Directory Open Access Journal |
issn | 1099-4300 |
language | English |
last_indexed | 2024-03-10T04:11:23Z |
publishDate | 2021-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Entropy |
spelling | doaj.art-a5f6ccc48d2c4dfc978a92e575f1051c2023-11-23T08:10:34ZengMDPI AGEntropy1099-43002021-11-012312160810.3390/e23121608Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification ProceduresBenjamin Kompa0Jasper Snoek1Andrew L. Beam2Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USAGoogle Research, Cambridge, MA 02142, USADepartment of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USAUncertainty quantification for complex deep learning models is increasingly important as these techniques see growing use in high-stakes, real-world settings. Currently, the quality of a model’s uncertainty is evaluated using point-prediction metrics, such as the negative log-likelihood (NLL), expected calibration error (ECE) or the Brier score on held-out data. Marginal coverage of prediction intervals or sets, a well-known concept in the statistical literature, is an intuitive alternative to these metrics but has yet to be systematically studied for many popular uncertainty quantification techniques for deep learning models. With marginal coverage and the complementary notion of the width of a prediction interval, downstream users of deployed machine learning models can better understand uncertainty quantification both on a global dataset level and on a per-sample basis. In this study, we provide the first large-scale evaluation of the empirical frequentist coverage properties of well-known uncertainty quantification techniques on a suite of regression and classification tasks. We find that, in general, some methods do achieve desirable coverage properties on <i>in distribution</i> samples, but that coverage is not maintained on out-of-distribution data. Our results demonstrate the failings of current uncertainty quantification techniques as dataset shift increases and reinforce coverage as an important metric in developing models for real-world applications.https://www.mdpi.com/1099-4300/23/12/1608uncertainty quantificationcoverageBayesian methodsdataset shift |
spellingShingle | Benjamin Kompa Jasper Snoek Andrew L. Beam Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures Entropy uncertainty quantification coverage Bayesian methods dataset shift |
title | Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures |
title_full | Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures |
title_fullStr | Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures |
title_full_unstemmed | Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures |
title_short | Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures |
title_sort | empirical frequentist coverage of deep learning uncertainty quantification procedures |
topic | uncertainty quantification coverage Bayesian methods dataset shift |
url | https://www.mdpi.com/1099-4300/23/12/1608 |
work_keys_str_mv | AT benjaminkompa empiricalfrequentistcoverageofdeeplearninguncertaintyquantificationprocedures AT jaspersnoek empiricalfrequentistcoverageofdeeplearninguncertaintyquantificationprocedures AT andrewlbeam empiricalfrequentistcoverageofdeeplearninguncertaintyquantificationprocedures |