A One-Shot Shift from Explore to Exploit in Monkey Prefrontal Cortex

被引:5
作者
Achterberg, Jascha [1 ]
Kadohisa, Mikiko [1 ]
Watanabe, Kei [2 ,3 ]
Kusunoki, Makoto [1 ]
Buckley, Mark J. [2 ]
Duncan, John [1 ,2 ]
机构
[1] Univ Cambridge, MRC Cognit & Brain Sci Unit, Cambridge CB2 7EF, England
[2] Univ Oxford, Dept Expt Psychol, Oxford OX2 6GG, England
[3] Osaka Univ, Grad Sch Frontier Biosci, Osaka 5650871, Japan
基金
英国医学研究理事会; 英国生物技术与生命科学研究理事会; 英国惠康基金;
关键词
  attention; exploit; explore; frontal cortex; one-shot learning; primate; ANTERIOR CINGULATE; WORKING-MEMORY; MODULATION; REPRESENTATION; CONFIDENCE; PREDICTION; NEURONS; EVENTS;
D O I
10.1523/JNEUROSCI.1338-21.2021
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Much animal learning is slow, with cumulative changes in behavior driven by reward prediction errors. When the abstract structure of a problem is known, however, both animals and formal learning models can rapidly attach new items to their roles within this structure, sometimes in a single trial. Frontal cortex is likely to play a key role in this process. To examine information seeking and use in a known problem structure, we trained monkeys in an explore/exploit task, requiring the animal first to test objects for their association with reward, then, once rewarded objects were found, to reselect them on further trials for further rewards. Many cells in the frontal cortex showed an explore/exploit preference aligned with one-shot learning in the monkeys' behavior: the population switched from an explore state to an exploit state after a single trial of learning but partially maintained the explore state if an error indicated that learning had failed. Binary switch from explore to exploit was not explained by continuous changes linked to expectancy or prediction error. Explore/exploit preferences were independent for two stages of the trial: object selection and receipt of feedback. Within an established task structure, frontal activity may control the separate processes of explore and exploit, switching in one trial between the two.
引用
收藏
页码:276 / 287
页数:12
相关论文
共 62 条
[1]   From fixed points to chaos: Three models of delayed discrimination [J].
Barak, Omri ;
Sussillo, David ;
Romo, Ranulfo ;
Tsodyks, Misha ;
Abbott, L. F. .
PROGRESS IN NEUROBIOLOGY, 2013, 103 :214-222
[2]   Prefrontal Cortex Predicts State Switches during Reversal Learning [J].
Bartolo, Ramon ;
Averbeck, Bruno B. .
NEURON, 2020, 106 (06) :1044-+
[3]   What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior [J].
Behrens, Timothy E. J. ;
Muller, Timothy H. ;
Whittington, James C. R. ;
Mark, Shirley ;
Baram, Alon B. ;
Stachenfeld, Kimberly L. ;
Kurth-Nelson, Zeb .
NEURON, 2018, 100 (02) :490-509
[4]   Shared Neural Markers of Decision Confidence and Error Detection [J].
Boldt, Annika ;
Yeung, Nick .
JOURNAL OF NEUROSCIENCE, 2015, 35 (08) :3478-3484
[5]   Deep Reinforcement Learning and Its Neuroscientific Implications [J].
Botvinick, Matthew ;
Wang, Jane X. ;
Dabney, Will ;
Miller, Kevin J. ;
Kurth-Nelson, Zeb .
NEURON, 2020, 107 (04) :603-616
[6]   Gradual progression from sensory to task-related processing in cerebral cortex [J].
Brincat, Scott L. ;
Siegel, Markus ;
von Nicolai, Constantin ;
Miller, Earl K. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (30) :E7202-E7211
[7]   Matching patterns of activity in primate prefrontal area 8a and parietal area 7ip neurons during a spatial working memory task [J].
Chafee, MV ;
Goldman-Rakic, PS .
JOURNAL OF NEUROPHYSIOLOGY, 1998, 79 (06) :2919-2940
[8]   Computing by Robust Transience: How the Fronto-Parietal Network Performs Sequential, Category-Based Decisions [J].
Chaisangmongkon, Warasinee ;
Swaminathan, Sruthi K. ;
Freedman, David J. ;
Wang, Xiao-Jing .
NEURON, 2017, 93 (06) :1504-+
[9]   Persistent Spiking Activity Underlies Working Memory [J].
Constantinidis, Christos ;
Funahashi, Shintaro ;
Lee, Daeyeol ;
Murray, John D. ;
Qi, Xue-Lian ;
Wang, Min ;
Arnsten, Amy F. T. .
JOURNAL OF NEUROSCIENCE, 2018, 38 (32) :7020-7028
[10]   First trial rewards promote 1-trial learning and prolonged memory in pigeon and baboon [J].
Cook, Robert ;
Fagot, Joel .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (23) :9530-9533