Distinct value computations support rapid sequential decisions

Abstract The value of the environment determines animals’ motivational states and sets expectations for error-based learning1–3. How are values computed? Reinforcement learning systems can store or cache values of states or actions that are learned from experience, or they can compute values using a...

Full description

Bibliographic Details
Main Authors:	Andrew Mah, Shannon S. Schiereck, Veronica Bossio, Christine M. Constantinople
Format:	Article
Language:	English
Published:	Nature Portfolio 2023-11-01
Series:	Nature Communications
Online Access:	https://doi.org/10.1038/s41467-023-43250-x

_version_	1827633729727627264
author	Andrew Mah Shannon S. Schiereck Veronica Bossio Christine M. Constantinople
author_facet	Andrew Mah Shannon S. Schiereck Veronica Bossio Christine M. Constantinople
author_sort	Andrew Mah
collection	DOAJ
description	Abstract The value of the environment determines animals’ motivational states and sets expectations for error-based learning1–3. How are values computed? Reinforcement learning systems can store or cache values of states or actions that are learned from experience, or they can compute values using a model of the environment to simulate possible futures3. These value computations have distinct trade-offs, and a central question is how neural systems decide which computations to use or whether/how to combine them4–8. Here we show that rats use distinct value computations for sequential decisions within single trials. We used high-throughput training to collect statistically powerful datasets from 291 rats performing a temporal wagering task with hidden reward states. Rats adjusted how quickly they initiated trials and how long they waited for rewards across states, balancing effort and time costs against expected rewards. Statistical modeling revealed that animals computed the value of the environment differently when initiating trials versus when deciding how long to wait for rewards, even though these decisions were only seconds apart. Moreover, value estimates interacted via a dynamic learning rate. Our results reveal how distinct value computations interact on rapid timescales, and demonstrate the power of using high-throughput training to understand rich, cognitive behaviors.
first_indexed	2024-03-09T15:03:36Z
format	Article
id	doaj.art-08701fb7e5f447118e45054a7cd744ce
institution	Directory Open Access Journal
issn	2041-1723
language	English
last_indexed	2024-03-09T15:03:36Z
publishDate	2023-11-01
publisher	Nature Portfolio
record_format	Article
series	Nature Communications
spelling	doaj.art-08701fb7e5f447118e45054a7cd744ce2023-11-26T13:45:30ZengNature PortfolioNature Communications2041-17232023-11-0114111410.1038/s41467-023-43250-xDistinct value computations support rapid sequential decisionsAndrew Mah0Shannon S. Schiereck1Veronica Bossio2Christine M. Constantinople3Center for Neural Science, New York UniversityCenter for Neural Science, New York UniversityCenter for Neural Science, New York UniversityCenter for Neural Science, New York UniversityAbstract The value of the environment determines animals’ motivational states and sets expectations for error-based learning1–3. How are values computed? Reinforcement learning systems can store or cache values of states or actions that are learned from experience, or they can compute values using a model of the environment to simulate possible futures3. These value computations have distinct trade-offs, and a central question is how neural systems decide which computations to use or whether/how to combine them4–8. Here we show that rats use distinct value computations for sequential decisions within single trials. We used high-throughput training to collect statistically powerful datasets from 291 rats performing a temporal wagering task with hidden reward states. Rats adjusted how quickly they initiated trials and how long they waited for rewards across states, balancing effort and time costs against expected rewards. Statistical modeling revealed that animals computed the value of the environment differently when initiating trials versus when deciding how long to wait for rewards, even though these decisions were only seconds apart. Moreover, value estimates interacted via a dynamic learning rate. Our results reveal how distinct value computations interact on rapid timescales, and demonstrate the power of using high-throughput training to understand rich, cognitive behaviors.https://doi.org/10.1038/s41467-023-43250-x
spellingShingle	Andrew Mah Shannon S. Schiereck Veronica Bossio Christine M. Constantinople Distinct value computations support rapid sequential decisions Nature Communications
title	Distinct value computations support rapid sequential decisions
title_full	Distinct value computations support rapid sequential decisions
title_fullStr	Distinct value computations support rapid sequential decisions
title_full_unstemmed	Distinct value computations support rapid sequential decisions
title_short	Distinct value computations support rapid sequential decisions
title_sort	distinct value computations support rapid sequential decisions
url	https://doi.org/10.1038/s41467-023-43250-x
work_keys_str_mv	AT andrewmah distinctvaluecomputationssupportrapidsequentialdecisions AT shannonsschiereck distinctvaluecomputationssupportrapidsequentialdecisions AT veronicabossio distinctvaluecomputationssupportrapidsequentialdecisions AT christinemconstantinople distinctvaluecomputationssupportrapidsequentialdecisions

Distinct value computations support rapid sequential decisions

Similar Items