Distinct value computations support rapid sequential decisions

Abstract The value of the environment determines animals’ motivational states and sets expectations for error-based learning1–3. How are values computed? Reinforcement learning systems can store or cache values of states or actions that are learned from experience, or they can compute values using a...

Full description

Bibliographic Details
Main Authors: Andrew Mah, Shannon S. Schiereck, Veronica Bossio, Christine M. Constantinople
Format: Article
Language:English
Published: Nature Portfolio 2023-11-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-023-43250-x
_version_ 1827633729727627264
author Andrew Mah
Shannon S. Schiereck
Veronica Bossio
Christine M. Constantinople
author_facet Andrew Mah
Shannon S. Schiereck
Veronica Bossio
Christine M. Constantinople
author_sort Andrew Mah
collection DOAJ
description Abstract The value of the environment determines animals’ motivational states and sets expectations for error-based learning1–3. How are values computed? Reinforcement learning systems can store or cache values of states or actions that are learned from experience, or they can compute values using a model of the environment to simulate possible futures3. These value computations have distinct trade-offs, and a central question is how neural systems decide which computations to use or whether/how to combine them4–8. Here we show that rats use distinct value computations for sequential decisions within single trials. We used high-throughput training to collect statistically powerful datasets from 291 rats performing a temporal wagering task with hidden reward states. Rats adjusted how quickly they initiated trials and how long they waited for rewards across states, balancing effort and time costs against expected rewards. Statistical modeling revealed that animals computed the value of the environment differently when initiating trials versus when deciding how long to wait for rewards, even though these decisions were only seconds apart. Moreover, value estimates interacted via a dynamic learning rate. Our results reveal how distinct value computations interact on rapid timescales, and demonstrate the power of using high-throughput training to understand rich, cognitive behaviors.
first_indexed 2024-03-09T15:03:36Z
format Article
id doaj.art-08701fb7e5f447118e45054a7cd744ce
institution Directory Open Access Journal
issn 2041-1723
language English
last_indexed 2024-03-09T15:03:36Z
publishDate 2023-11-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj.art-08701fb7e5f447118e45054a7cd744ce2023-11-26T13:45:30ZengNature PortfolioNature Communications2041-17232023-11-0114111410.1038/s41467-023-43250-xDistinct value computations support rapid sequential decisionsAndrew Mah0Shannon S. Schiereck1Veronica Bossio2Christine M. Constantinople3Center for Neural Science, New York UniversityCenter for Neural Science, New York UniversityCenter for Neural Science, New York UniversityCenter for Neural Science, New York UniversityAbstract The value of the environment determines animals’ motivational states and sets expectations for error-based learning1–3. How are values computed? Reinforcement learning systems can store or cache values of states or actions that are learned from experience, or they can compute values using a model of the environment to simulate possible futures3. These value computations have distinct trade-offs, and a central question is how neural systems decide which computations to use or whether/how to combine them4–8. Here we show that rats use distinct value computations for sequential decisions within single trials. We used high-throughput training to collect statistically powerful datasets from 291 rats performing a temporal wagering task with hidden reward states. Rats adjusted how quickly they initiated trials and how long they waited for rewards across states, balancing effort and time costs against expected rewards. Statistical modeling revealed that animals computed the value of the environment differently when initiating trials versus when deciding how long to wait for rewards, even though these decisions were only seconds apart. Moreover, value estimates interacted via a dynamic learning rate. Our results reveal how distinct value computations interact on rapid timescales, and demonstrate the power of using high-throughput training to understand rich, cognitive behaviors.https://doi.org/10.1038/s41467-023-43250-x
spellingShingle Andrew Mah
Shannon S. Schiereck
Veronica Bossio
Christine M. Constantinople
Distinct value computations support rapid sequential decisions
Nature Communications
title Distinct value computations support rapid sequential decisions
title_full Distinct value computations support rapid sequential decisions
title_fullStr Distinct value computations support rapid sequential decisions
title_full_unstemmed Distinct value computations support rapid sequential decisions
title_short Distinct value computations support rapid sequential decisions
title_sort distinct value computations support rapid sequential decisions
url https://doi.org/10.1038/s41467-023-43250-x
work_keys_str_mv AT andrewmah distinctvaluecomputationssupportrapidsequentialdecisions
AT shannonsschiereck distinctvaluecomputationssupportrapidsequentialdecisions
AT veronicabossio distinctvaluecomputationssupportrapidsequentialdecisions
AT christinemconstantinople distinctvaluecomputationssupportrapidsequentialdecisions