Flexible Exploration Strategies in Multi-Agent Reinforcement Learning for Instability by Mutual Learning

被引：1

作者：

Miyashita, Yuki ^{[1
]}

Sugawara, Toshiharu ^{[2
]}

机构：

[1] Shimizu Cooporat, Tokyo, Japan

[2] Waseda Univ, Comp Sci & Commun Engn, Tokyo, Japan

来源：

2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年

关键词：

Exploration; Coordination; Multi-agent deep reinforcement learning;

D O I：

10.1109/ICMLA55696.2022.00100

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A fundamental challenge in multi-agent reinforcement learning is an effective exploration of state-action spaces because agents must learn their policies in a non-stationary environment due to changing policies of other learning agents. As the agent's learning progresses, different undesired situations may appear one after another and agents have to learn again to adapt them. Therefore, agents must learn again with a high probability of exploration to find the appropriate actions for the exposed situation. However, existing algorithms can suffer from inability to learn behavior again on the lack of exploration for these situations because agents usually become exploitationoriented by using simple exploration strategies, such as egreedy strategy. Therefore, we propose two types of simple exploration strategies, where each agent monitors the trend of performance and controls the exploration probability, e, based on the transition of performance. By introducing a coordinated problem called the PushBlock problem, which includes the above issue, we show that the proposed method could improve the overall performance relative to conventional e-greedy strategies and analyze their effects on the generated behavior.

引用

页码：579 / 584

页数：6

共 14 条

[1]

Aly I. E., 2021, P 20 INT C AUT AG MU, P483

[2]

[Anonymous], 2020, US

[3]

Fortunato M., 2018, INT C LEARNING REPRE

[4]

He K, 2021, P 20 INT C AUT AG MU, P602

[5]

Jaques N, 2019, PR MACH LEARN RES, V97

[6]

Liu IJ, 2021, PR MACH LEARN RES, V139

[7] Coordinated Behavior for Sequential Cooperative Task Using Two-Stage Reward Assignment with Decay [J].

Miyashita, Yuki ;

Sugawara, Toshiharu .

NEURAL INFORMATION PROCESSING, ICONIP 2020, PT II, 2020, 12533 :257-269

[8]

Mohseni-Kabir A, 2019, IEEE INT CONF ROBOT, P3370, DOI [10.1109/icra.2019.8793721, 10.1109/ICRA.2019.8793721]

[9]

Palmer G, 2018, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), P443

[10]

Rashid T, 2018, PR MACH LEARN RES, V80

← 1 2 →