Bayesian reinforcement learning: A basic overview

被引:3
作者
Kang, Pyungwon [1 ]
Tobler, Philippe N. [1 ]
Dayan, Peter [2 ,3 ]
机构
[1] Univ Zurich, Dept Econ, Lab Social & Neural Syst Res, Zurich, Switzerland
[2] Max Planck Inst Biol Cybernet, Tubingen, Germany
[3] Univ Tubingen, Tubingen, Germany
基金
瑞士国家科学基金会;
关键词
Bayesian approach; Reinforcement learning; RETROSPECTIVE REVALUATION; COGNITIVE MAP; ORBITOFRONTAL CORTEX; BACKWARD BLOCKING; EXTINCTION; ATTENTION; MODELS; ASSOCIATIONS; UNCERTAINTY; COMPETITION;
D O I
10.1016/j.nlm.2024.107924
中图分类号
B84 [心理学]; C [社会科学总论]; Q98 [人类学];
学科分类号
03 ; 0303 ; 030303 ; 04 ; 0402 ;
摘要
We and other animals learn because there is some aspect of the world about which we are uncertain. This uncertainty arises from initial ignorance, and from changes in the world that we do not perfectly know; the uncertainty often becomes evident when our predictions about the world are found to be erroneous. The RescorlaWagner learning rule, which specifies one way that prediction errors can occasion learning, has been hugely influential as a characterization of Pavlovian conditioning and, through its equivalence to the delta rule in engineering, in a much wider class of learning problems. Here, we review the embedding of the Rescorla-Wagner rule in a Bayesian context that is precise about the link between uncertainty and learning, and thereby discuss extensions to such suggestions as the Kalman filter, structure learning, and beyond, that collectively encompass a wider range of uncertainties and accommodate a wider assortment of phenomena in conditioning.
引用
收藏
页数:8
相关论文
共 112 条
[1]   Attention-deficit/hyperactivity disorder and the explore/exploit trade-off [J].
Addicott, Merideth A. ;
Pearson, John M. ;
Schechter, Julia C. ;
Sapyta, Jeffrey J. ;
Weiss, Margaret D. ;
Kollins, Scott H. .
NEUROPSYCHOPHARMACOLOGY, 2021, 46 (03) :614-621
[2]  
Ahmadi M, 2020, Normative Hidden Variable Models of Learning and Decision Making Under Uncertainty
[3]  
Aitken MRF, 2005, LEARN BEHAV, V33, P147
[4]  
Alonso Eduardo, 2014, 6th International Conference on Agents and Artificial Intelligence (ICAART 2014). Proceedings, P548
[5]  
Alonso E, 2014, CAMBRIDGE HANDBOOK OF ARTIFICIAL INTELLIGENCE, P232
[6]   Industrial Applications of the Kalman Filter: A Review [J].
Auger, Francois ;
Hilairet, Mickael ;
Guerrero, Josep M. ;
Monmasson, Eric ;
Orlowska-Kowalska, Teresa ;
Katsura, Seiichiro .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2013, 60 (12) :5458-5471
[7]   SIMULTANEOUS CONDITIONING DEMONSTRATED IN 2ND-ORDER CONDITIONING - EVIDENCE FOR SIMILAR ASSOCIATIVE STRUCTURE IN FORWARD AND SIMULTANEOUS CONDITIONING [J].
BARNET, RC ;
ARNOLD, HM ;
MILLER, RR .
LEARNING AND MOTIVATION, 1991, 22 (03) :253-268
[8]   ASSOCIATIVE SEARCH NETWORK - A REINFORCEMENT LEARNING ASSOCIATIVE MEMORY [J].
BARTO, AG ;
SUTTON, RS ;
BROUWER, PS .
BIOLOGICAL CYBERNETICS, 1981, 40 (03) :201-211
[9]   Learning the value of information in an uncertain world [J].
Behrens, Timothy E. J. ;
Woolrich, Mark W. ;
Walton, Mark E. ;
Rushworth, Matthew F. S. .
NATURE NEUROSCIENCE, 2007, 10 (09) :1214-1221
[10]   What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior [J].
Behrens, Timothy E. J. ;
Muller, Timothy H. ;
Whittington, James C. R. ;
Mark, Shirley ;
Baram, Alon B. ;
Stachenfeld, Kimberly L. ;
Kurth-Nelson, Zeb .
NEURON, 2018, 100 (02) :490-509