Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration

被引:0
作者
Wang, Yuhao [1 ]
Lak, Armin [2 ]
Manohar, Sanjay G. [3 ]
Bogacz, Rafal [1 ]
机构
[1] Univ Oxford, MRC Brain Network Dynam Unit, Oxford, England
[2] Univ Oxford, Dept Physiol Anat & Genet, Oxford, England
[3] Univ Oxford, Nuffield Dept Clin Neurosci, Oxford, England
基金
英国惠康基金; 英国科研创新办公室; 英国医学研究理事会; 英国生物技术与生命科学研究理事会;
关键词
STRIATAL DOPAMINE; NEURONS; VARIABILITY; PREDICTION; HUMANS; CHOICE; SYSTEM;
D O I
10.1371/journal.pcbi.1011516
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
When facing an unfamiliar environment, animals need to explore to gain new knowledge about which actions provide reward, but also put the newly acquired knowledge to use as quickly as possible. Optimal reinforcement learning strategies should therefore assess the uncertainties of these action-reward associations and utilise them to inform decision making. We propose a novel model whereby direct and indirect striatal pathways act together to estimate both the mean and variance of reward distributions, and mesolimbic dopaminergic neurons provide transient novelty signals, facilitating effective uncertainty-driven exploration. We utilised electrophysiological recording data to verify our model of the basal ganglia, and we fitted exploration strategies derived from the neural model to data from behavioural experiments. We also compared the performance of directed exploration strategies inspired by our basal ganglia model with other exploration algorithms including classic variants of upper confidence bound (UCB) strategy in simulation. The exploration strategies inspired by the basal ganglia model can achieve overall superior performance in simulation, and we found qualitatively similar results in fitting model to behavioural data compared with the fitting of more idealised normative models with less implementation level detail. Overall, our results suggest that transient dopamine levels in the basal ganglia that encode novelty could contribute to an uncertainty representation which efficiently drives exploration in reinforcement learning. Humans and other animals learn from rewards and losses resulting from their actions to maximise their chances of survival. In many cases, a trial-and-error process is necessary to determine the most rewarding action in a certain context. During this process, determining how much resource should be allocated to acquiring information ("exploration") and how much should be allocated to utilising the existing information to maximise reward ("exploitation") is key to the overall effectiveness, i.e., the maximisation of total reward obtained with a certain amount of effort. We propose a theory whereby an area within the mammalian brain called the basal ganglia integrates current knowledge about the mean reward, reward uncertainty and novelty of an action in order to implement an algorithm which optimally allocates resources between exploration and exploitation. We verify our theory using behavioural experiments and electrophysiological recording, and show in simulations that the model also achieves good performance in comparison with established benchmark algorithms.
引用
收藏
页数:27
相关论文
共 49 条
  • [21] Brain reward circuitry beyond the mesolimbic dopamine system: A neurobiological theory
    Ikemoto, Satoshi
    [J]. NEUROSCIENCE AND BIOBEHAVIORAL REVIEWS, 2010, 35 (02) : 129 - 150
  • [22] On the normative advantages of dopamine and striatal opponency for learning and choice
    Jaskir, Alana
    Frank, Michael J.
    [J]. ELIFE, 2023, 12
  • [23] Dopamine: generalization and bonuses
    Kakade, S
    Dayan, P
    [J]. NEURAL NETWORKS, 2002, 15 (4-6) : 549 - 559
  • [24] SEQUENTIAL CHOICE FROM SEVERAL POPULATIONS
    KATEHAKIS, MN
    ROBBINS, H
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (19) : 8584 - 8585
  • [25] THE MULTIARMED BANDIT PROBLEM - DECOMPOSITION AND COMPUTATION
    KATEHAKIS, MN
    VEINOTT, AF
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 1987, 12 (02) : 262 - 268
  • [26] Dopamine signaling in the nucleus accumbens core mediates latent inhibition
    Kutlu, Munir Gunes
    Zachry, Jennifer E.
    Melugin, Patrick R.
    Tat, Jennifer
    Cajigas, Stephanie
    Isiktas, Atagun U.
    Patel, Dev D.
    Siciliano, Cody A.
    Schoenbaum, Geoffrey
    Sharpe, Melissa J.
    Calipari, Erin S.
    [J]. NATURE NEUROSCIENCE, 2022, 25 (08) : 1071 - +
  • [27] ASYMPTOTICALLY EFFICIENT ADAPTIVE ALLOCATION RULES
    LAI, TL
    ROBBINS, H
    [J]. ADVANCES IN APPLIED MATHEMATICS, 1985, 6 (01) : 4 - 22
  • [28] Dopamine neurons learn relative chosen value from probabilistic rewards
    Lak, Armin
    Stauffer, William R.
    Schultz, Wolfram
    [J]. ELIFE, 2016, 5
  • [29] RESPONSES OF MONKEY DOPAMINE NEURONS DURING LEARNING OF BEHAVIORAL REACTIONS
    LJUNGBERG, T
    APICELLA, P
    SCHULTZ, W
    [J]. JOURNAL OF NEUROPHYSIOLOGY, 1992, 67 (01) : 145 - 163
  • [30] Single Nigrostriatal Dopaminergic Neurons Form Widely Spread and Highly Dense Axonal Arborizations in the Neostriatum
    Matsuda, Wakoto
    Furuta, Takahiro
    Nakamura, Kouichi C.
    Hioki, Hiroyuki
    Fujiyama, Fumino
    Arai, Ryohachi
    Kaneko, Takeshi
    [J]. JOURNAL OF NEUROSCIENCE, 2009, 29 (02) : 444 - 453