Raising the bar on the evaluation of out-of-distribution detection

In image classification, a lot of development has happened in detecting out-of-distribution (OoD) data. However, most OoD detection methods are evaluated on a standard set of datasets, arbitrarily different from training data. There is no clear definition of what forms a "good" OoD dataset...

Full description

Bibliographic Details
Main Authors:	Mukhoti, J, Lin, T-Y, Chen, B-C, Shah, A, Torr, PHS, Dokania, PK, Lim, S-N
Format:	Conference item
Language:	English
Published:	EEE 2023

_version_	1811139332123131904
author	Mukhoti, J Lin, T-Y Chen, B-C Shah, A Torr, PHS Dokania, PK Lim, S-N
author_facet	Mukhoti, J Lin, T-Y Chen, B-C Shah, A Torr, PHS Dokania, PK Lim, S-N
author_sort	Mukhoti, J
collection	OXFORD
description	In image classification, a lot of development has happened in detecting out-of-distribution (OoD) data. However, most OoD detection methods are evaluated on a standard set of datasets, arbitrarily different from training data. There is no clear definition of what forms a "good" OoD dataset. Furthermore, the state-of-the-art OoD detection methods already achieve near perfect results on these standard benchmarks. In this paper, we define 2 categories of OoD data using the subtly different concepts of perceptual/visual and semantic similarity to in-distribution (iD) data. We define Near OoD samples as perceptually similar but semantically different from iD samples, and Shifted samples as points which are visually different but semantically akin to iD data. We then propose a GAN based framework for generating OoD samples from each of these 2 categories, given an iD dataset. Through extensive experiments on MNIST, CIFAR-10/100 and ImageNet, we show that a) state-of-the-art OoD detection methods which perform exceedingly well on conventional benchmarks are significantly less robust to our proposed benchmark. Moreover, we observe that b) models performing well on our setup also perform well on conventional real-world OoD detection benchmarks and vice versa, thereby indicating that one might not even need a separate OoD set, to reliably evaluate performance in OoD detection.
first_indexed	2024-09-25T04:04:24Z
format	Conference item
id	oxford-uuid:f22c31df-9cbb-4292-a1a0-8faaa5c39028
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:04:24Z
publishDate	2023
publisher	EEE
record_format	dspace
spelling	oxford-uuid:f22c31df-9cbb-4292-a1a0-8faaa5c390282024-05-16T12:03:00ZRaising the bar on the evaluation of out-of-distribution detectionConference itemhttp://purl.org/coar/resource_type/c_5794uuid:f22c31df-9cbb-4292-a1a0-8faaa5c39028EnglishSymplectic ElementsEEE2023Mukhoti, JLin, T-YChen, B-CShah, ATorr, PHSDokania, PKLim, S-NIn image classification, a lot of development has happened in detecting out-of-distribution (OoD) data. However, most OoD detection methods are evaluated on a standard set of datasets, arbitrarily different from training data. There is no clear definition of what forms a "good" OoD dataset. Furthermore, the state-of-the-art OoD detection methods already achieve near perfect results on these standard benchmarks. In this paper, we define 2 categories of OoD data using the subtly different concepts of perceptual/visual and semantic similarity to in-distribution (iD) data. We define Near OoD samples as perceptually similar but semantically different from iD samples, and Shifted samples as points which are visually different but semantically akin to iD data. We then propose a GAN based framework for generating OoD samples from each of these 2 categories, given an iD dataset. Through extensive experiments on MNIST, CIFAR-10/100 and ImageNet, we show that a) state-of-the-art OoD detection methods which perform exceedingly well on conventional benchmarks are significantly less robust to our proposed benchmark. Moreover, we observe that b) models performing well on our setup also perform well on conventional real-world OoD detection benchmarks and vice versa, thereby indicating that one might not even need a separate OoD set, to reliably evaluate performance in OoD detection.
spellingShingle	Mukhoti, J Lin, T-Y Chen, B-C Shah, A Torr, PHS Dokania, PK Lim, S-N Raising the bar on the evaluation of out-of-distribution detection
title	Raising the bar on the evaluation of out-of-distribution detection
title_full	Raising the bar on the evaluation of out-of-distribution detection
title_fullStr	Raising the bar on the evaluation of out-of-distribution detection
title_full_unstemmed	Raising the bar on the evaluation of out-of-distribution detection
title_short	Raising the bar on the evaluation of out-of-distribution detection
title_sort	raising the bar on the evaluation of out of distribution detection
work_keys_str_mv	AT mukhotij raisingthebarontheevaluationofoutofdistributiondetection AT linty raisingthebarontheevaluationofoutofdistributiondetection AT chenbc raisingthebarontheevaluationofoutofdistributiondetection AT shaha raisingthebarontheevaluationofoutofdistributiondetection AT torrphs raisingthebarontheevaluationofoutofdistributiondetection AT dokaniapk raisingthebarontheevaluationofoutofdistributiondetection AT limsn raisingthebarontheevaluationofoutofdistributiondetection

Raising the bar on the evaluation of out-of-distribution detection

Similar Items