Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

被引:292
作者
Matignon, Laetitia [1 ]
Laurent, Guillaume J. [1 ]
Le Fort-Piat, Nadine [1 ]
机构
[1] UFC ENSMM UTBM, FEMTO ST Inst, UMR CNRS 6174, F-25000 Besancon, France
关键词
CONVERGENCE; SYSTEMS;
D O I
10.1017/S0269888912000057
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms' strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.
引用
收藏
页码:1 / 31
页数:31
相关论文
共 62 条
[1]   A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics [J].
Abdallah, Sherief ;
Lesser, Victor .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 33 :521-549
[2]  
Agogino A K, 2005, P 4 INT JOINT C AUT, P8188, DOI DOI 10.1145/1082473.1082486
[3]  
Agogino Adrian K., 2010, Journal of Autonomous Agents and Multi-Agent Systems, V23, P1
[4]  
[Anonymous], 1999, NASAARCIC9963
[5]  
[Anonymous], 2004, Proceedings of the International Conference on Autonomous Agents and Multiagent Systems
[6]  
Bab A, 2008, J MACH LEARN RES, V9, P2635
[7]  
Balch T., 1994, Autonomous Robots, V1, P27, DOI 10.1007/BF00735341
[8]   On-policy concurrent reinforcement learning [J].
Banerjee, B ;
Sen, S ;
Peng, J .
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2004, 16 (04) :245-260
[9]  
Banerjee Bikramjit., 2003, C AUTONOMOUS AGENTS, P686
[10]  
Benda M., 1986, BCSG2010280 BOEING A