Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

被引：296

作者：

Matignon, Laetitia ^{[1
]}

Laurent, Guillaume J. ^{[1
]}

Le Fort-Piat, Nadine ^{[1
]}

机构：

[1] UFC ENSMM UTBM, FEMTO ST Inst, UMR CNRS 6174, F-25000 Besancon, France

来源：

KNOWLEDGE ENGINEERING REVIEW | 2012年 / 27卷 / 01期

关键词：

CONVERGENCE; SYSTEMS;

D O I：

10.1017/S0269888912000057

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms' strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.

引用

页码：1 / 31

页数：31

共 62 条

[31]

Kuyer L, 2008, LECT NOTES ARTIF INT, V5211, P656, DOI 10.1007/978-3-540-87479-9_61

[32]

Lauer M, 2000, P 7 INT C MACH LEARN

[33]

Laurent G. J., 2010, INNOVATION KNOWLEDGE, V15

[34]

LITTMAN ML, 2001, J COGN SYST RES, V2, P55, DOI DOI 10.1016/S1389-0417(01)00015-8

[35] Distributed manipulation using discrete actuator arrays [J].

Luntz, JE ;

Messner, W ;

Choset, H .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2001, 20 (07) :553-583

[36] Using communication to reduce locality in distributed multiagent learning [J].

Mataric, MJ .

JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 1998, 10 (03) :357-369

[37]

Matignon L., 2008, P 7 INT C AUT AG MUL

[38] Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams [J].

Matignon, Laetitia ;

Laurent, Guillaume J. ;

Le Fort-Piat, Nadine .

2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, :64-69

[39]

Matignon L, 2006, LECT NOTES COMPUT SC, V4131, P840

[40] Designing Decentralized Controllers for Distributed-Air-Jet MEMS-Based Micromanipulators by Reinforcement Learning [J].

Matignon, Laetitia ;

Laurent, Guillaume J. ;

Le Fort-Piat, Nadine ;

Chapuis, Yves-Andr .

JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2010, 59 (02) :145-166

← 1 2 3 4 5 6 7 →