Signal detection models as contextual bandits

Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. He...

Full description

Bibliographic Details
Main Authors:	Thomas N. Sherratt, Erica O'Neill
Format:	Article
Language:	English
Published:	The Royal Society 2023-06-01
Series:	Royal Society Open Science
Subjects:	decision theory signal detection theory multi-armed bandit contextual bandit Softmax Thompson sampling
Online Access:	https://royalsocietypublishing.org/doi/10.1098/rsos.230157

_version_	1797798571349639168
author	Thomas N. Sherratt Erica O'Neill
author_facet	Thomas N. Sherratt Erica O'Neill
author_sort	Thomas N. Sherratt
collection	DOAJ
description	Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. Here we recast the classical normal–normal (and power-law) signal detection model as a contextual multi-armed bandit (CMAB). Thus, rather than starting with complete information, decision-makers must infer how the magnitude of a continuous cue is related to the probability that a signaller is desirable, while simultaneously seeking to exploit the information they acquire. We explain how various CMAB heuristics resolve the trade-off between better estimating the underlying relationship and exploiting it. Next, we determined how naive human volunteers resolve signal detection problems with a continuous cue. As anticipated, a model of choice (accept/reject) that assumed volunteers immediately adopted the SDT-predicted acceptance threshold did not predict volunteer behaviour well. The Softmax rule for solving CMABs, with choices based on a logistic function of the expected payoffs, best explained the decisions of our volunteers but a simple midpoint algorithm also predicted decisions well under some conditions. CMABs offer principled parametric solutions to solving many classical SDT problems when decision-makers start with incomplete information.
first_indexed	2024-03-13T04:05:47Z
format	Article
id	doaj.art-e85c6570e1ad4b92862d0568dc77f1ad
institution	Directory Open Access Journal
issn	2054-5703
language	English
last_indexed	2024-03-13T04:05:47Z
publishDate	2023-06-01
publisher	The Royal Society
record_format	Article
series	Royal Society Open Science
spelling	doaj.art-e85c6570e1ad4b92862d0568dc77f1ad2023-06-21T07:05:41ZengThe Royal SocietyRoyal Society Open Science2054-57032023-06-0110610.1098/rsos.230157Signal detection models as contextual banditsThomas N. Sherratt0Erica O'Neill1Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. Here we recast the classical normal–normal (and power-law) signal detection model as a contextual multi-armed bandit (CMAB). Thus, rather than starting with complete information, decision-makers must infer how the magnitude of a continuous cue is related to the probability that a signaller is desirable, while simultaneously seeking to exploit the information they acquire. We explain how various CMAB heuristics resolve the trade-off between better estimating the underlying relationship and exploiting it. Next, we determined how naive human volunteers resolve signal detection problems with a continuous cue. As anticipated, a model of choice (accept/reject) that assumed volunteers immediately adopted the SDT-predicted acceptance threshold did not predict volunteer behaviour well. The Softmax rule for solving CMABs, with choices based on a logistic function of the expected payoffs, best explained the decisions of our volunteers but a simple midpoint algorithm also predicted decisions well under some conditions. CMABs offer principled parametric solutions to solving many classical SDT problems when decision-makers start with incomplete information.https://royalsocietypublishing.org/doi/10.1098/rsos.230157decision theorysignal detection theorymulti-armed banditcontextual banditSoftmaxThompson sampling
spellingShingle	Thomas N. Sherratt Erica O'Neill Signal detection models as contextual bandits Royal Society Open Science decision theory signal detection theory multi-armed bandit contextual bandit Softmax Thompson sampling
title	Signal detection models as contextual bandits
title_full	Signal detection models as contextual bandits
title_fullStr	Signal detection models as contextual bandits
title_full_unstemmed	Signal detection models as contextual bandits
title_short	Signal detection models as contextual bandits
title_sort	signal detection models as contextual bandits
topic	decision theory signal detection theory multi-armed bandit contextual bandit Softmax Thompson sampling
url	https://royalsocietypublishing.org/doi/10.1098/rsos.230157
work_keys_str_mv	AT thomasnsherratt signaldetectionmodelsascontextualbandits AT ericaoneill signaldetectionmodelsascontextualbandits

Signal detection models as contextual bandits

Similar Items