Reinforcement Learning or Active Inference?

被引:276
作者
Friston, Karl J.
Daunizeau, Jean
Kiebel, Stefan J.
机构
[1] The Wellcome Trust Centre for Neuroimaging, University College London, London
来源
PLOS ONE | 2009年 / 4卷 / 07期
基金
英国惠康基金;
关键词
FREE-ENERGY; MODELS; DOPAMINE; UNCERTAINTY; PERCEPTION; PREDICTION; PRINCIPLE; RESPONSES; SYSTEMS; SIGNAL;
D O I
10.1371/journal.pone.0006421
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper questions the need for reinforcement learning or control theory when optimising behaviour. We show that it is fairly simple to teach an agent complicated and adaptive behaviours using a free-energy formulation of perception. In this formulation, agents adjust their internal states and sampling of the environment to minimize their free-energy. Such agents learn causal structure in the environment and sample it in an adaptive and self-supervised fashion. This results in behavioural policies that reproduce those optimised by reinforcement learning and dynamic programming. Critically, we do not need to invoke the notion of reward, value or utility. We illustrate these points by solving a benchmark problem in dynamic programming; namely the mountain-car problem, using active perception or inference under the free-energy principle. The ensuing proof-of-concept may be important because the free-energy formulation furnishes a unified account of both action and perception and may speak to a reappraisal of the role of dopamine in the brain.
引用
收藏
页数:13
相关论文
共 61 条
[1]  
Abbott LF, 1997, SCIENCE, V275, P220, DOI 10.1126/science.275.5297.221
[2]  
ANOSOV DV, 2001, HAZEWINKEL MICHIEL E
[3]   PARALLEL VISUAL COMPUTATION [J].
BALLARD, DH ;
HINTON, GE ;
SEJNOWSKI, TJ .
NATURE, 1983, 306 (5938) :21-26
[4]   PATTERN RECOGNITION AND RESPONSES OF SENSORY NEURONS [J].
BARLOW, HB .
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES, 1969, 156 (A2) :872-&
[6]   Behavioural studies of strategic thinking in games [J].
Camerer, CF .
TRENDS IN COGNITIVE SCIENCES, 2003, 7 (05) :225-231
[7]   The computational neurobiology of learning and reward [J].
Daw, ND ;
Doya, K .
CURRENT OPINION IN NEUROBIOLOGY, 2006, 16 (02) :199-204
[8]   THE HELMHOLTZ MACHINE [J].
DAYAN, P ;
HINTON, GE ;
NEAL, RM ;
ZEMEL, RS .
NEURAL COMPUTATION, 1995, 7 (05) :889-904
[9]   Thermodynamics and evolution [J].
Demetrius, L .
JOURNAL OF THEORETICAL BIOLOGY, 2000, 206 (01) :1-16
[10]   Bayesian spiking neurons I: Inference [J].
Deneve, Sophie .
NEURAL COMPUTATION, 2008, 20 (01) :91-117