Flexible Exploration Strategies in Multi-Agent Reinforcement Learning for Instability by Mutual Learning

被引:1
作者
Miyashita, Yuki [1 ]
Sugawara, Toshiharu [2 ]
机构
[1] Shimizu Cooporat, Tokyo, Japan
[2] Waseda Univ, Comp Sci & Commun Engn, Tokyo, Japan
来源
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年
关键词
Exploration; Coordination; Multi-agent deep reinforcement learning;
D O I
10.1109/ICMLA55696.2022.00100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A fundamental challenge in multi-agent reinforcement learning is an effective exploration of state-action spaces because agents must learn their policies in a non-stationary environment due to changing policies of other learning agents. As the agent's learning progresses, different undesired situations may appear one after another and agents have to learn again to adapt them. Therefore, agents must learn again with a high probability of exploration to find the appropriate actions for the exposed situation. However, existing algorithms can suffer from inability to learn behavior again on the lack of exploration for these situations because agents usually become exploitationoriented by using simple exploration strategies, such as egreedy strategy. Therefore, we propose two types of simple exploration strategies, where each agent monitors the trend of performance and controls the exploration probability, e, based on the transition of performance. By introducing a coordinated problem called the PushBlock problem, which includes the above issue, we show that the proposed method could improve the overall performance relative to conventional e-greedy strategies and analyze their effects on the generated behavior.
引用
收藏
页码:579 / 584
页数:6
相关论文
共 14 条
[1]  
Aly I. E., 2021, P 20 INT C AUT AG MU, P483
[2]  
[Anonymous], 2020, US
[3]  
Fortunato M., 2018, INT C LEARNING REPRE
[4]  
He K, 2021, P 20 INT C AUT AG MU, P602
[5]  
Jaques N, 2019, PR MACH LEARN RES, V97
[6]  
Liu IJ, 2021, PR MACH LEARN RES, V139
[7]   Coordinated Behavior for Sequential Cooperative Task Using Two-Stage Reward Assignment with Decay [J].
Miyashita, Yuki ;
Sugawara, Toshiharu .
NEURAL INFORMATION PROCESSING, ICONIP 2020, PT II, 2020, 12533 :257-269
[8]  
Mohseni-Kabir A, 2019, IEEE INT CONF ROBOT, P3370, DOI [10.1109/icra.2019.8793721, 10.1109/ICRA.2019.8793721]
[9]  
Palmer G, 2018, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), P443
[10]  
Rashid T, 2018, PR MACH LEARN RES, V80