Summary: | The basal ganglia (BG) are thought to play a crucial role in reinforcement learning. Central to the learning mechanism are dopamine D1 and D2 receptors located in the cortico-striatal synapses. However, it is still unclear how this dopamine-mediated synaptic plasticity is deployed and coordinated during reward-contingent behavioral changes. Here we propose a computational model of reinforcement learning that uses different thresholds of D1- and D2-mediated synaptic plasticity which are antagonized by dopamine-independent synaptic plasticity. A phasic increase in dopamine release caused by a larger-than-expected reward induces long-term potentiation (LTP) in the direct pathway, whereas a phasic decrease in dopamine release caused by a smaller-than-expected reward induces a cessation of long-term depression (LTD), leading to LTP in the indirect pathway. This learning mechanism can explain the robust behavioral adaptation observed in a location-reward-value-association task where the animal makes shorter latency saccades to rewarding locations. The changes in saccade latency become quicker as the monkey becomes more experienced. This behavior can be explained by a switching mechanism which activates the cortico-striatal circuit selectively. Our model can also simulate the effects of D1 and D2 receptor blockade, and behavioral changes in Parkinson’s disease.
|