Novelty detection improves performance of reinforcement learners in fluctuating, partially observable environments

被引:3
作者
Marzen, Sarah E. [1 ,2 ,3 ,4 ]
机构
[1] MIT, Dept Phys, Phys Living Syst Grp, Cambridge, MA 02139 USA
[2] Claremont Mckenna Coll, WM Keck Sci Dept, Claremont, CA 91711 USA
[3] Pitzer Coll, WM Keck Sci Dept, Claremont, CA 91711 USA
[4] Scripps Coll, WM Keck Sci Dept, Claremont, CA 91711 USA
关键词
Fading memory; Reinforcement learning; Novelty detection;
D O I
10.1016/j.jtbi.2019.06.007
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Evolved and engineered organisms must adapt to fluctuating environments that are often only partially observed. We show that adaptation to a second environment can be significantly harder after adapting to a first, completely unrelated environment, even when using second-order learning algorithms and a constant learning rate. In effect, there is a lack of fading memory in the organism's performance. However, organisms can adapt well to the second environment by incorporating a simple novelty detection algorithm that signals when the environment has changed and reinitializing the parameters that define their behavior if so. We propose that it may be fruitful to look for signs of this novelty detection in biological organisms, and to engineer novelty detection algorithms into artificial organisms. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:44 / 50
页数:7
相关论文
共 40 条
[1]   Robustness in simple biochemical networks [J].
Barkai, N ;
Leibler, S .
NATURE, 1997, 387 (6636) :913-917
[2]   Context and behavioral processes in extinction [J].
Bouton, ME .
LEARNING & MEMORY, 2004, 11 (05) :485-494
[3]   FADING MEMORY AND THE PROBLEM OF APPROXIMATING NONLINEAR OPERATORS WITH VOLTERRA SERIES [J].
BOYD, S ;
CHUA, LO .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1985, 32 (11) :1150-1161
[4]   The Neuronal Basis of Predictive Coding Along the Auditory Pathway: From the Subcortical Roots to Cortical Deviance Detection [J].
Carbajal, Guillermo V. ;
Malmierca, Manuel S. .
TRENDS IN HEARING, 2018, 22
[5]   Bacterial strategies for chemotaxis response [J].
Celani, Antonio ;
Vergassola, Massimo .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (04) :1391-1396
[6]   Bayesian spiking neurons I: Inference [J].
Deneve, Sophie .
NEURAL COMPUTATION, 2008, 20 (01) :91-117
[7]   Asymmetric dynamics in optimal variance adaptation [J].
DeWeese, M ;
Zador, A .
NEURAL COMPUTATION, 1998, 10 (05) :1179-1202
[8]   Incremental Learning of Concept Drift in Nonstationary Environments [J].
Elwell, Ryan ;
Polikar, Robi .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (10) :1517-1531
[9]  
Espeholt L, 2018, PR MACH LEARN RES, V80
[10]   Perfect and Near-Perfect Adaptation in Cell Signaling [J].
Ferrell, James E., Jr. .
CELL SYSTEMS, 2016, 2 (02) :62-67