Exponential moving average based multiagent reinforcement learning algorithms

被引：13

作者：

Awheda, Mostafa D. ^{[1
]}

Schwartz, Howard M. ^{[1
]}

机构：

[1] Carleton Univ, Dept Syst & Comp Engn, 1125 Colonel By Dr, Ottawa, ON K1S 5B6, Canada

来源：

ARTIFICIAL INTELLIGENCE REVIEW | 2016年 / 45卷 / 03期

关键词：

Multi-agent learning systems; Reinforcement learning; Markov decision processes; Nash equilibrium; ADAPTIVE OPTIMAL-CONTROL; H-INFINITY CONTROL; EQUATION;

D O I：

10.1007/s10462-015-9447-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Two multi-agent policy iteration learning algorithms are proposed in this work. The two proposed algorithms use the exponential moving average approach along with the Q-learning algorithm as a basis to update the policy for the learning agent so that the agent's policy converges to a Nash equilibrium policy. The first proposed algorithm uses a constant learning rate when updating the policy of the learning agent, while the second proposed algorithm uses two different decaying learning rates. These two decaying learning rates are updated based on either the Win-or-Learn-Fast (WoLF) mechanism or the Win-or-Learn-Slow (WoLS) mechanism. The WoLS mechanism is introduced in this article to make the algorithm learn fast when it is winning and learn slowly when it is losing. The second proposed algorithm uses the rewards received by the learning agent to decide which mechanism (WoLF mechanism or WoLS mechanism) to use for the game being learned. The proposed algorithms have been theoretically analyzed and a mathematical proof of convergence to pure Nash equilibrium is provided for each algorithm. In the case of games with mixed Nash equilibrium, our mathematical analysis shows that the second proposed algorithm converges to an equilibrium. Although our mathematical analysis does not explicitly show that the second proposed algorithm converges to a Nash equilibrium, our simulation results indicate that the second proposed algorithm does converge to Nash equilibrium. The proposed algorithms are examined on a variety of matrix and stochastic games. Simulation results show that the second proposed algorithm converges in a wider variety of situations than state-of-the-art multi-agent reinforcement learning algorithms.

引用

页码：299 / 332

页数：34

共 48 条

[1] A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics [J].

Abdallah, Sherief ;

Lesser, Victor .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 33 :521-549

[2]

[Anonymous], 2004, NeurIPS

[3]

[Anonymous], 2001, P INT JOINT C ART IN

[4]

[Anonymous], 2000, UAI

[5]

[Anonymous], 2004, Journal of machine learning research, DOI DOI 10.1162/1532443041827880

[6]

[Anonymous], 2000, Multiagent Systems: AModern Approach to DistributedArtificial Intelligence

[7]

Awheda MD, 2015, CAN CON EL COMP EN, P1006, DOI 10.1109/CCECE.2015.7129412

[8]

Awheda MD, 2013, IEEE SYMP ADAPT DYNA, P31, DOI 10.1109/ADPRL.2013.6614986

[9] Generalized multiagent learning with performance bound [J].

Banerjee, Bikramjit ;

Peng, Jing .

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 15 (03) :281-312

[10]

Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics

← 1 2 3 4 5 →