Learning and exploration in action-perception loops

被引：57

作者：

Little, Daniel Y. ^{[1
]}

Sommer, Friedrich T. ^{[2
]}

机构：

[1] Univ Calif Berkeley, Redwood Ctr Theoret Neurosci, Dept Mol & Cell Biol, Berkeley, CA 94720 USA

[2] Univ Calif Berkeley, Redwood Ctr Theoret Neurosci, Helen Wills Neurosci Inst, Berkeley, CA 94720 USA

来源：

FRONTIERS IN NEURAL CIRCUITS | 2013年 / 7卷

基金：

美国国家科学基金会;

关键词：

knowledge acquisition; information theory; control theory; machine learning; behavioral psychology; computational neuroscience; INFORMATION; CURIOSITY; BEHAVIOR; DRIVE;

D O I：

10.3389/fncir.2013.00037

中图分类号：

Q189 [神经科学];

学科分类号：

071006 ;

摘要：

Discovering the structure underlying observed data is a recurring problem in machine learning with important applications in neuroscience. It is also a primary function of the brain. When data can be actively collected in the context of a closed action-perception loop, behavior becomes a critical determinant of learning efficiently. Psychologists studying exploration and curiosity in humans and animals have long argued that learning itself is a primary motivator of behavior. However, the theoretical basis of learning-driven behavior is not well understood. Previous computational studies of behavior have largely focused on the control problem of maximizing acquisition of rewards and have treated learning the structure of data as a secondary objective. Here, we study exploration in the absence of external reward feedback. Instead, we take the quality of an agent's learned internal model to be the primary objective. In a simple probabilistic framework, we derive a Bayesian estimate for the amount of information about the environment an agent can expect to receive by taking an action, a measure we term the predicted information gain (PIG). We develop exploration strategies that approximately maximise PIG. One strategy based on value-iteration consistently learns faster than previously developed reward-free exploration strategies across a diverse range of environments. Psychologists believe the evolutionary advantage of learning-driven exploration lies in the generalized utility of an accurate internal model. Consistent with this hypothesis, we demonstrate that agents which learn more efficiently during exploration are later better able to accomplish a range of goal-directed tasks. We will conclude by discussing how our work elucidates the explorative behaviors of animals and humans, its relationship to other computational models of behavior, and its potential application to experimental design, such as in closed-loop neurophysiology studies.

引用

页数：19

共 62 条

[1]

[Anonymous], 2006, EXPLORING PSYCHOL IN

[2]

[Anonymous], 2009, Vision Res., DOI [DOI 10.1016/J.VISRES.2008.09.007, 10.1016/j.visres.2008.09.007]

[3]

[Anonymous], 2005, P INT C MACH LEARN, DOI [10.1145/1102351.1102352, 10.1145/1102351, DOI 10.1145/1102351]

[4]

[Anonymous], 2008, INTRO INFORM RETRIEV, DOI DOI 10.1017/CBO9780511809071

[5]

Archer J., 1983, EXPLORATION ANIMALS

[6]

Asmuth J, 2009, P 25 C UNC ART INT, P19

[7] Predictive information and explorative behavior of autonomous robots [J].

Ay, N. ;

Bertschinger, N. ;

Der, R. ;

Guettler, F. ;

Olbrich, E. .

EUROPEAN PHYSICAL JOURNAL B, 2008, 63 (03) :329-339

[8] Of bits and wows: A Bayesian theory of surprise with applications to attention [J].

Baldi, Pierre ;

Itti, Laurent .

NEURAL NETWORKS, 2010, 23 (05) :649-666

[9]

Baron J., 2005, Rationality and intelligence

[10]

Barto A. G., 1990, CONNECTIONIST MODELS

← 1 2 3 4 5 6 7 →