Multi-Agent Reinforcement Learning Based on Clustering in Two-Player Games

被引:0
作者
Li, Weifan [1 ]
Zhuand, Yuanheng
Zhao, Dongbin
机构
[1] Chinese Acad Sci, State Key Lab Management & Control Complex Syst, Inst Automat, Beijing 100190, Peoples R China
来源
2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019) | 2019年
关键词
reinforcement learning; unsupervised clustering; matrix game; multi-agent;
D O I
10.1109/ssci44817.2019.9003120
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Non-stationary environment is general in real environment, including adversarial environment and multi-agent problem. Multi-agent environment is a typical non-stationary environment. Each agent of the shared environment must learn a efficient interaction for maximizing the expected reward. Independent reinforcement learning (InRL) is the simplest form in which each agent treats other agents as part of environment. In this paper, we present Max-Mean-Learning-Win-or-Learn-Fast (MML-WoLF), which is an independent on-policy learning algorithm based on reinforcement clustering. A variational auto-encoder method based on reinforcement learning is proposed to extract features for unsupervised clustering. Based on clustering results, MML-WoLF uses statistics and the dominated factor to calculate the values of the states that belong to a certain category. The agent policy is iteratively updated by the value. We apply our algorithm to multi-agent problems including matrix-game, grid world, and continuous world game. The clustering results are able to show the strategies distribution under the agent's current policy. The experiment results suggest that our method significantly improves average performance over other independent learning algorithms in multi-agent problems.
引用
收藏
页码:57 / 63
页数:7
相关论文
共 35 条
[1]   A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics [J].
Abdallah, Sherief ;
Lesser, Victor .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 33 :521-549
[2]  
Akrour R, 2018, IEEE INT C INT ROBOT, P534, DOI 10.1109/IROS.2018.8594201
[3]  
[Anonymous], 1996, KDD
[4]  
[Anonymous], 2004, ADV NEURAL INFORM PR
[5]  
[Anonymous], 2010, AAAI C ART INT
[6]  
[Anonymous], 2017, ADV NEURAL INFORM PR
[7]  
Asmuth J., 2009, C UNCERTAINTY ARTIFI, P19
[8]  
Awheda MD, 2013, IEEE SYMP ADAPT DYNA, P31, DOI 10.1109/ADPRL.2013.6614986
[9]  
Bellemare MG, 2017, PR MACH LEARN RES, V70
[10]   Multiagent learning using a variable learning rate [J].
Bowling, M ;
Veloso, M .
ARTIFICIAL INTELLIGENCE, 2002, 136 (02) :215-250