Signal detection models as contextual bandits

Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. He...

Full description

Bibliographic Details
Main Authors: Thomas N. Sherratt, Erica O'Neill
Format: Article
Language:English
Published: The Royal Society 2023-06-01
Series:Royal Society Open Science
Subjects:
Online Access:https://royalsocietypublishing.org/doi/10.1098/rsos.230157
_version_ 1797798571349639168
author Thomas N. Sherratt
Erica O'Neill
author_facet Thomas N. Sherratt
Erica O'Neill
author_sort Thomas N. Sherratt
collection DOAJ
description Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. Here we recast the classical normal–normal (and power-law) signal detection model as a contextual multi-armed bandit (CMAB). Thus, rather than starting with complete information, decision-makers must infer how the magnitude of a continuous cue is related to the probability that a signaller is desirable, while simultaneously seeking to exploit the information they acquire. We explain how various CMAB heuristics resolve the trade-off between better estimating the underlying relationship and exploiting it. Next, we determined how naive human volunteers resolve signal detection problems with a continuous cue. As anticipated, a model of choice (accept/reject) that assumed volunteers immediately adopted the SDT-predicted acceptance threshold did not predict volunteer behaviour well. The Softmax rule for solving CMABs, with choices based on a logistic function of the expected payoffs, best explained the decisions of our volunteers but a simple midpoint algorithm also predicted decisions well under some conditions. CMABs offer principled parametric solutions to solving many classical SDT problems when decision-makers start with incomplete information.
first_indexed 2024-03-13T04:05:47Z
format Article
id doaj.art-e85c6570e1ad4b92862d0568dc77f1ad
institution Directory Open Access Journal
issn 2054-5703
language English
last_indexed 2024-03-13T04:05:47Z
publishDate 2023-06-01
publisher The Royal Society
record_format Article
series Royal Society Open Science
spelling doaj.art-e85c6570e1ad4b92862d0568dc77f1ad2023-06-21T07:05:41ZengThe Royal SocietyRoyal Society Open Science2054-57032023-06-0110610.1098/rsos.230157Signal detection models as contextual banditsThomas N. Sherratt0Erica O'Neill1Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. Here we recast the classical normal–normal (and power-law) signal detection model as a contextual multi-armed bandit (CMAB). Thus, rather than starting with complete information, decision-makers must infer how the magnitude of a continuous cue is related to the probability that a signaller is desirable, while simultaneously seeking to exploit the information they acquire. We explain how various CMAB heuristics resolve the trade-off between better estimating the underlying relationship and exploiting it. Next, we determined how naive human volunteers resolve signal detection problems with a continuous cue. As anticipated, a model of choice (accept/reject) that assumed volunteers immediately adopted the SDT-predicted acceptance threshold did not predict volunteer behaviour well. The Softmax rule for solving CMABs, with choices based on a logistic function of the expected payoffs, best explained the decisions of our volunteers but a simple midpoint algorithm also predicted decisions well under some conditions. CMABs offer principled parametric solutions to solving many classical SDT problems when decision-makers start with incomplete information.https://royalsocietypublishing.org/doi/10.1098/rsos.230157decision theorysignal detection theorymulti-armed banditcontextual banditSoftmaxThompson sampling
spellingShingle Thomas N. Sherratt
Erica O'Neill
Signal detection models as contextual bandits
Royal Society Open Science
decision theory
signal detection theory
multi-armed bandit
contextual bandit
Softmax
Thompson sampling
title Signal detection models as contextual bandits
title_full Signal detection models as contextual bandits
title_fullStr Signal detection models as contextual bandits
title_full_unstemmed Signal detection models as contextual bandits
title_short Signal detection models as contextual bandits
title_sort signal detection models as contextual bandits
topic decision theory
signal detection theory
multi-armed bandit
contextual bandit
Softmax
Thompson sampling
url https://royalsocietypublishing.org/doi/10.1098/rsos.230157
work_keys_str_mv AT thomasnsherratt signaldetectionmodelsascontextualbandits
AT ericaoneill signaldetectionmodelsascontextualbandits