Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

被引:296
作者
Matignon, Laetitia [1 ]
Laurent, Guillaume J. [1 ]
Le Fort-Piat, Nadine [1 ]
机构
[1] UFC ENSMM UTBM, FEMTO ST Inst, UMR CNRS 6174, F-25000 Besancon, France
关键词
CONVERGENCE; SYSTEMS;
D O I
10.1017/S0269888912000057
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms' strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.
引用
收藏
页码:1 / 31
页数:31
相关论文
共 62 条
[31]  
Kuyer L, 2008, LECT NOTES ARTIF INT, V5211, P656, DOI 10.1007/978-3-540-87479-9_61
[32]  
Lauer M, 2000, P 7 INT C MACH LEARN
[33]  
Laurent G. J., 2010, INNOVATION KNOWLEDGE, V15
[34]  
LITTMAN ML, 2001, J COGN SYST RES, V2, P55, DOI DOI 10.1016/S1389-0417(01)00015-8
[35]   Distributed manipulation using discrete actuator arrays [J].
Luntz, JE ;
Messner, W ;
Choset, H .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2001, 20 (07) :553-583
[36]   Using communication to reduce locality in distributed multiagent learning [J].
Mataric, MJ .
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 1998, 10 (03) :357-369
[37]  
Matignon L., 2008, P 7 INT C AUT AG MUL
[38]   Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams [J].
Matignon, Laetitia ;
Laurent, Guillaume J. ;
Le Fort-Piat, Nadine .
2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, :64-69
[39]  
Matignon L, 2006, LECT NOTES COMPUT SC, V4131, P840
[40]   Designing Decentralized Controllers for Distributed-Air-Jet MEMS-Based Micromanipulators by Reinforcement Learning [J].
Matignon, Laetitia ;
Laurent, Guillaume J. ;
Le Fort-Piat, Nadine ;
Chapuis, Yves-Andr .
JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2010, 59 (02) :145-166