A New Multi-Agent Reinforcement Learning Method Based on Evolving Dynamic Correlation Matrix

被引:9
作者
Gan, Xingli [1 ,2 ]
Guo, Hongliang [3 ]
Li, Zhan [3 ]
机构
[1] China Elect Technol Grp Corp, Res Inst 54, Shijiazhuang 050081, Hebei, Peoples R China
[2] State Key Lab Satellite Nav Syst & Equipment Tech, Shijiazhuang 050000, Hebei, Peoples R China
[3] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu 610054, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Heuristic algorithms; Correlation; Evolutionary computation; Convergence; Tuning; Roads; Multi-agent reinforcement learning; dynamic correlation matrix; convergence; meta-parameter evolution;
D O I
10.1109/ACCESS.2019.2946848
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-agent reinforcement learning approaches can be roughly classified into two categories. One is the agent-based approach which can be implemented in real distributed systems, though most approaches of this type cannot provide meaningful theoretical verifications. The other can be seen as the more formalized approach, which can provide theoretical results. However, most of current algorithms usually require unrealistic global communication, which makes them impractical for real distributed systems. In this article, we propose a dynamic correlation matrix based multi-agent reinforcement learning approach where the meta-parameters are evolved using an evolutionary algorithm. We believe that our approach is able to fill the gap between the two kinds of traditional multi-agent reinforcement learning approaches by providing both agent-level implementation and system-level convergence verification. The basic idea of this approach is that agents learn not only from local environmental feedback, i.e., their own experiences and rewards, but also from other agents experiences. In this way, the agents learning speed can be increased significantly. The performance of the proposed algorithm is demonstrated on a number of application scenarios, including blackjack games, urban traffic control systems and multi-robot foraging.
引用
收藏
页码:162127 / 162138
页数:12
相关论文
共 59 条
[1]  
Agogino A., 2006, P 21 NATL C ARTIFICI, V2, P1438
[2]  
[Anonymous], 2009, P GEN EV COMP C, DOI DOI 10.1145/1569901.1570064
[3]  
[Anonymous], 1994, AUTONOMOUS ROBOTS
[4]  
[Anonymous], AGENT BASED MODELING
[5]   Behavior-based formation control for multirobot teams [J].
Balch, T ;
Arkin, RC .
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1998, 14 (06) :926-939
[6]  
Balch T., 1999, P WORKSH AG LEARN AB
[7]  
Bonabeau E., 1999, SWARM INTELLIGENCE N
[8]  
Cheng YL, 2009, INT J CHEM REACT ENG, V7
[9]  
Coggan M., 2001, P 4 INT C COMP INT M
[10]  
Croonenborghs T., 2005, Learning and Adaption in Multi-Agent Systems. First International Workshop, LAMAS 2005. Revised Selected Papers (Lecture Notes in Artificial Intelligence Vol. 3898), P192