Coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates.

A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide a...

Full description

Bibliographic Details
Main Authors: Robert R Kerr, David B Grayden, Doreen A Thomas, Matthieu Gilson, Anthony N Burkitt
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24475240/?tool=EBI
_version_ 1818701989266063360
author Robert R Kerr
David B Grayden
Doreen A Thomas
Matthieu Gilson
Anthony N Burkitt
author_facet Robert R Kerr
David B Grayden
Doreen A Thomas
Matthieu Gilson
Anthony N Burkitt
author_sort Robert R Kerr
collection DOAJ
description A fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments.
first_indexed 2024-12-17T15:29:36Z
format Article
id doaj.art-75a7409e8d204eafa2be0aaf25ff03aa
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-17T15:29:36Z
publishDate 2014-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-75a7409e8d204eafa2be0aaf25ff03aa2022-12-21T21:43:11ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0191e8712310.1371/journal.pone.0087123Coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates.Robert R KerrDavid B GraydenDoreen A ThomasMatthieu GilsonAnthony N BurkittA fundamental goal of neuroscience is to understand how cognitive processes, such as operant conditioning, are performed by the brain. Typical and well studied examples of operant conditioning, in which the firing rates of individual cortical neurons in monkeys are increased using rewards, provide an opportunity for insight into this. Studies of reward-modulated spike-timing-dependent plasticity (RSTDP), and of other models such as R-max, have reproduced this learning behavior, but they have assumed that no unsupervised learning is present (i.e., no learning occurs without, or independent of, rewards). We show that these models cannot elicit firing rate reinforcement while exhibiting both reward learning and ongoing, stable unsupervised learning. To fix this issue, we propose a new RSTDP model of synaptic plasticity based upon the observed effects that dopamine has on long-term potentiation and depression (LTP and LTD). We show, both analytically and through simulations, that our new model can exhibit unsupervised learning and lead to firing rate reinforcement. This requires that the strengthening of LTP by the reward signal is greater than the strengthening of LTD and that the reinforced neuron exhibits irregular firing. We show the robustness of our findings to spike-timing correlations, to the synaptic weight dependence that is assumed, and to changes in the mean reward. We also consider our model in the differential reinforcement of two nearby neurons. Our model aligns more strongly with experimental studies than previous models and makes testable predictions for future experiments.https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24475240/?tool=EBI
spellingShingle Robert R Kerr
David B Grayden
Doreen A Thomas
Matthieu Gilson
Anthony N Burkitt
Coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates.
PLoS ONE
title Coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates.
title_full Coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates.
title_fullStr Coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates.
title_full_unstemmed Coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates.
title_short Coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates.
title_sort coexistence of reward and unsupervised learning during the operant conditioning of neural firing rates
url https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24475240/?tool=EBI
work_keys_str_mv AT robertrkerr coexistenceofrewardandunsupervisedlearningduringtheoperantconditioningofneuralfiringrates
AT davidbgrayden coexistenceofrewardandunsupervisedlearningduringtheoperantconditioningofneuralfiringrates
AT doreenathomas coexistenceofrewardandunsupervisedlearningduringtheoperantconditioningofneuralfiringrates
AT matthieugilson coexistenceofrewardandunsupervisedlearningduringtheoperantconditioningofneuralfiringrates
AT anthonynburkitt coexistenceofrewardandunsupervisedlearningduringtheoperantconditioningofneuralfiringrates