Simple learning rules to cope with changing environments

被引:35
作者
Gross, Roderich [1 ]
Houston, Alasdair I. [1 ]
Collins, Edmund J. [2 ]
McNamara, John M. [2 ]
Dechaume-Moncharmont, Francois-Xavier [3 ]
Franks, Nigel R. [1 ]
机构
[1] Univ Bristol, Sch Biol Sci, Bristol BS8 1UG, Avon, England
[2] Univ Walk, Univ Bristol, Dept Math, Bristol BS8 1TW, Avon, England
[3] Univ Bourgogne, F-21000 Dijon, France
关键词
decision making; learning rules; dynamic environments; multi-armed bandit;
D O I
10.1098/rsif.2007.1348
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We consider an agent that must choose repeatedly among several actions. Each action has a certain probability of giving the agent an energy reward, and costs may be associated with switching between actions. The agent does not know which action has the highest reward probability, and the probabilities change randomly over time. We study two learning rules that have been widely used to model decision-making processes in animals one deterministic and the other stochastic. In particular, we examine the influence of the rules' 'learning rate' on the agent's energy gain. We compare the performance of each rule with the best performance attainable when the agent has either full knowledge or no knowledge of the environment. Over relatively short periods of time, both rules are successful in enabling agents to exploit their environment. Moreover, under a range of effective learning rates, both rules are equivalent, and can be expressed by a third rule that requires the agent to select the action for which the current run of unsuccessful trials is shortest. However, the performance of both rules is relatively poor over longer periods of time, and under most circumstances no better than the performance an agent could achieve without knowledge of the environment. We propose a simple extension to the original rules that enables agents to learn about and effectively exploit a changing environment for an unlimited period of time.
引用
收藏
页码:1193 / 1202
页数:10
相关论文
共 49 条
[1]   Learning rules for social foragers: Implications for the producer-scrounger game and ideal free distribution theory [J].
Beauchamp, G .
JOURNAL OF THEORETICAL BIOLOGY, 2000, 207 (01) :21-35
[2]   INDIVIDUAL DECISIONS AND THE DISTRIBUTION OF PREDATORS IN A PATCHY ENVIRONMENT [J].
BERNSTEIN, C ;
KACELNIK, A ;
KREBS, JR .
JOURNAL OF ANIMAL ECOLOGY, 1988, 57 (03) :1007-1026
[3]   INDIVIDUAL DECISIONS AND THE DISTRIBUTION OF PREDATORS IN A PATCHY ENVIRONMENT .2. THE INFLUENCE OF TRAVEL COSTS AND STRUCTURE OF THE ENVIRONMENT [J].
BERNSTEIN, C ;
KACELNIK, A ;
KREBS, JR .
JOURNAL OF ANIMAL ECOLOGY, 1991, 60 (01) :205-225
[4]  
BUSH RR, 1955, STOCHASTIC MODELS LE
[5]   Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration [J].
Cohen, Jonathan D. ;
McClure, Samuel M. ;
Yu, Angela J. .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2007, 362 (1481) :933-942
[6]   PROBABILISTIC BEHAVIOR IN ANTS - A STRATEGY OF ERRORS [J].
DENEUBOURG, JL ;
PASTEELS, JM ;
VERHAEGHE, JC .
JOURNAL OF THEORETICAL BIOLOGY, 1983, 105 (02) :259-271
[7]   Exploration or exploitation: life expectancy changes the value of learning in foraging strategies [J].
Eliassen, Sigrunn ;
Jorgensen, Christian ;
Mangel, Marc ;
Giske, Jarl .
OIKOS, 2007, 116 (03) :513-523
[8]   Individual variation in prey selection by sea otters: patterns, causes and implications [J].
Estes, JA ;
Riedman, ML ;
Staedler, MM ;
Tinker, MT ;
Lyon, BE .
JOURNAL OF ANIMAL ECOLOGY, 2003, 72 (01) :144-155
[9]   Scale-dependent predator-prey interactions: The hierarchical spatial distribution of seabirds and prey [J].
Fauchald, P ;
Erikstad, KE ;
Skarsfjord, H .
ECOLOGY, 2000, 81 (03) :773-783
[10]   Not everything that counts can be counted: ants use multiple metrics for a single nest trait [J].
Franks, NR ;
Dornhaus, A ;
Metherell, BG ;
Nelson, TR ;
Lanfear, SAJ ;
Symes, WS .
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2006, 273 (1583) :165-169