Action-modulated midbrain dopamine activity arises from distributed control policies

被引:0
作者
Lindsey, Jack [1 ]
Litwin-Kumar, Ashok [1 ]
机构
[1] Columbia Univ, Dept Neurosci, New York, NY 10027 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年
关键词
NEURAL-NETWORK MODEL; BASAL GANGLIA; STRIATAL DOPAMINE; DORSAL STRIATUM; MOTOR CORTEX; PREDICTION; NEURONS; COMPUTATIONS; STIMULATION; CEREBELLUM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Animal behavior is driven by multiple brain regions working in parallel with distinct control policies. We present a biologically plausible model of off-policy reinforcement learning in the basal ganglia, which enables learning in such an architecture. The model accounts for action-related modulation of dopamine activity that is not captured by previous models that implement on-policy algorithms. In particular, the model predicts that dopamine activity signals a combination of reward prediction error (as in classic models) and "action surprise," a measure of how unexpected an action is relative to the basal ganglia's current policy. In the presence of the action surprise term, the model implements an approximate form of Q-learning. On benchmark navigation and reaching tasks, we show empirically that this model is capable of learning from data driven completely or in part by other policies (e.g. from other brain regions). By contrast, models without the action surprise term suffer in the presence of additional policies, and are incapable of learning at all from behavior that is completely externally driven. The model provides a computational account for numerous experimental findings about dopamine activity that cannot be explained by classic models of reinforcement learning in the basal ganglia. These include differing levels of action surprise signals in dorsal and ventral striatum, decreasing amounts of movement-modulated dopamine activity with practice, and representations of action initiation and kinematics in dopamine activity. It also provides further predictions that can be tested with recordings of striatal dopamine activity.(1)
引用
收藏
页数:14
相关论文
共 67 条
  • [1] Prefrontal Cortex-Driven Dopamine Signals in the Striatum Show Unique Spatial and Pharmacological Properties
    Adrover, Martin F.
    Shin, Jung Hoon
    Quiroz, Cesar
    Ferre, Sergi
    Lemos, Julia C.
    Alvarez, Veronica A.
    [J]. JOURNAL OF NEUROSCIENCE, 2020, 40 (39) : 7510 - 7522
  • [2] [Anonymous], 1995, Models of information processing in the basal ganglia
  • [3] Deep Reinforcement Learning A brief survey
    Arulkumaran, Kai
    Deisenroth, Marc Peter
    Brundage, Miles
    Bharath, Anil Anthony
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 26 - 38
  • [4] Cortical and basal ganglia contributions to habit learning and automaticity
    Ashby, F. Gregory
    Turner, Benjamin O.
    Horvitz, Jon C.
    [J]. TRENDS IN COGNITIVE SCIENCES, 2010, 14 (05) : 208 - 215
  • [5] Beyond reward prediction errors: the role of dopamine in movement kinematics
    Barter, Joseph W.
    Li, Suellen
    Lu, Dongye
    Bartholomew, Ryan A.
    Rossi, Mark A.
    Shoemaker, Charles T.
    Salas-Meza, Daniel
    Gaidis, Erin
    Yin, Henry H.
    [J]. FRONTIERS IN INTEGRATIVE NEUROSCIENCE, 2015, 9
  • [6] The basal ganglia and the cerebellum: nodes in an integrated network
    Bostan, Andreea C.
    Strick, Peter L.
    [J]. NATURE REVIEWS NEUROSCIENCE, 2018, 19 (06) : 338 - 350
  • [7] The basal ganglia communicate with the cerebellum
    Bostan, Andreea C.
    Dum, Richard P.
    Strick, Peter L.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (18) : 8452 - 8456
  • [8] STRIATAL DOPAMINE AFTER CORTICAL INJURY
    BOYESON, MG
    FEENEY, DM
    [J]. EXPERIMENTAL NEUROLOGY, 1985, 89 (02) : 479 - 483
  • [9] Brown J, 1999, J NEUROSCI, V19, P10502
  • [10] Dopamine and cAMP-regulated phosphoprotein 32 kDa controls both striatal long-term depression and long-term potentiation, opposing forms of synaptic plasticity
    Calabresi, P
    Gubellini, P
    Centonze, D
    Picconi, B
    Bernardi, G
    Chergui, K
    Svenningsson, P
    Fienberg, AA
    Greengard, P
    [J]. JOURNAL OF NEUROSCIENCE, 2000, 20 (22) : 8443 - 8451