Multi-Agent Reinforcement Learning Based on Clustering in Two-Player Games

被引：0

作者：

Li, Weifan ^{[1
]}

Zhuand, Yuanheng

Zhao, Dongbin

机构：

[1] Chinese Acad Sci, State Key Lab Management & Control Complex Syst, Inst Automat, Beijing 100190, Peoples R China

来源：

2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019) | 2019年

关键词：

reinforcement learning; unsupervised clustering; matrix game; multi-agent;

D O I：

10.1109/ssci44817.2019.9003120

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Non-stationary environment is general in real environment, including adversarial environment and multi-agent problem. Multi-agent environment is a typical non-stationary environment. Each agent of the shared environment must learn a efficient interaction for maximizing the expected reward. Independent reinforcement learning (InRL) is the simplest form in which each agent treats other agents as part of environment. In this paper, we present Max-Mean-Learning-Win-or-Learn-Fast (MML-WoLF), which is an independent on-policy learning algorithm based on reinforcement clustering. A variational auto-encoder method based on reinforcement learning is proposed to extract features for unsupervised clustering. Based on clustering results, MML-WoLF uses statistics and the dominated factor to calculate the values of the states that belong to a certain category. The agent policy is iteratively updated by the value. We apply our algorithm to multi-agent problems including matrix-game, grid world, and continuous world game. The clustering results are able to show the strategies distribution under the agent's current policy. The experiment results suggest that our method significantly improves average performance over other independent learning algorithms in multi-agent problems.

引用

页码：57 / 63

页数：7

共 35 条

[1] A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics [J].

Abdallah, Sherief ;

Lesser, Victor .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 33 :521-549

[2]

Akrour R, 2018, IEEE INT C INT ROBOT, P534, DOI 10.1109/IROS.2018.8594201

[3]

[Anonymous], 1996, KDD

[4]

[Anonymous], 2004, ADV NEURAL INFORM PR

[5]

[Anonymous], 2010, AAAI C ART INT

[6]

[Anonymous], 2017, ADV NEURAL INFORM PR

[7]

Asmuth J., 2009, C UNCERTAINTY ARTIFI, P19

[8]

Awheda MD, 2013, IEEE SYMP ADAPT DYNA, P31, DOI 10.1109/ADPRL.2013.6614986

[9]

Bellemare MG, 2017, PR MACH LEARN RES, V70

[10] Multiagent learning using a variable learning rate [J].

Bowling, M ;

Veloso, M .

ARTIFICIAL INTELLIGENCE, 2002, 136 (02) :215-250

← 1 2 3 4 →