Mesolimbic dopamine adapts the rate of learning from action

被引:40
作者
Coddington, Luke T. T. [1 ]
Lindo, Sarah E. E. [1 ]
Dudman, Joshua T. T. [1 ]
机构
[1] Howard Hughes Med Inst, Janelia Res Campus, Ashburn, VA 20147 USA
关键词
BASAL GANGLIA; NEURONS; SIGNALS; MODEL; STRIATUM; REWARD; TIME; PLASTICITY; PREDICTION; MOVEMENT;
D O I
10.1038/s41586-022-05614-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recent success in training artificial agents and robots derives from a combination of direct learning of behavioural policies and indirect learning through value functions(1-3). Policy learning and value learning use distinct algorithms that optimize behavioural performance and reward prediction, respectively. In animals, behavioural learning and the role of mesolimbic dopamine signalling have been extensively evaluated with respect to reward prediction(4); however, so far there has been little consideration of how direct policy learning might inform our understanding(5). Here we used a comprehensive dataset of orofacial and body movements to understand how behavioural policies evolved as naive, head-restrained mice learned a trace conditioning paradigm. Individual differences in initial dopaminergic reward responses correlated with the emergence of learned behavioural policy, but not the emergence of putative value encoding for a predictive cue. Likewise, physiologically calibrated manipulations of mesolimbic dopamine produced several effects inconsistent with value learning but predicted by a neural-network-based model that used dopamine signals to set an adaptive rate, not an error signal, for behavioural policy learning. This work provides strong evidence that phasic dopamine activity can regulate direct learning of behavioural policies, expanding the explanatory power of reinforcement learning models for animal learning(6). Analysis of data collected from mice learning a trace conditioning paradigm shows that phasic dopamine activity in the brain can regulate direct learning of behavioural policies, and dopamine sets an adaptive learning rate rather than an error-like teaching signal.
引用
收藏
页码:294 / +
页数:29
相关论文
共 69 条
  • [1] Akaike H, 1998, SELECTED PAPERS HIRO, P199, DOI DOI 10.1007/978-1-4612-1694-0_15
  • [2] A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning
    Amo, Ryunosuke
    Matias, Sara
    Yamanaka, Akihiro
    Tanaka, Kenji F.
    Uchida, Naoshige
    Watabe-Uchida, Mitsuko
    [J]. NATURE NEUROSCIENCE, 2022, 25 (08) : 1082 - +
  • [3] Space, time and dopamine
    Arbuthnott, Gordon W.
    Wickens, Jeff
    [J]. TRENDS IN NEUROSCIENCES, 2007, 30 (02) : 62 - 69
  • [4] Value-free reinforcement learning: policy optimization as a minimal model of operant behavior
    Bennett, Daniel
    Niv, Yael
    Langdon, Angela J.
    [J]. CURRENT OPINION IN BEHAVIORAL SCIENCES, 2021, 41 : 114 - 121
  • [5] Dissecting components of reward: 'liking', 'wanting', and learning
    Berridge, Kent C.
    Robinson, Terry E.
    Aldridge, J. Wayne
    [J]. CURRENT OPINION IN PHARMACOLOGY, 2009, 9 (01) : 65 - 73
  • [6] Optimization Methods for Large-Scale Machine Learning
    Bottou, Leon
    Curtis, Frank E.
    Nocedal, Jorge
    [J]. SIAM REVIEW, 2018, 60 (02) : 223 - 311
  • [7] Precisely timed dopamine signals establish distinct kinematic representations of skilled movements
    Bova, Alexandra
    Gaidica, Matt
    Hurst, Amy
    Iwai, Yoshiko
    Hunter, Julia
    Leventhal, Daniel K.
    [J]. ELIFE, 2020, 9 : 1 - 141
  • [8] Primary food reward and reward-predictive stimuli evoke different patterns of phasic dopamine signaling throughout the striatum
    Brown, Holden D.
    McCutcheon, James E.
    Cone, Jackson J.
    Ragozzino, Michael E.
    Roitman, Mitchell F.
    [J]. EUROPEAN JOURNAL OF NEUROSCIENCE, 2011, 34 (12) : 1997 - 2006
  • [9] Coddington LT, 2021, METHODS MOL BIOL, V2188, P273, DOI 10.1007/978-1-0716-0818-0_14
  • [10] Learning from Action: Reconsidering Movement Signaling in Midbrain Dopamine Neuron Activity
    Coddington, Luke T.
    Dudman, Joshua T.
    [J]. NEURON, 2019, 104 (01) : 63 - 77