Multi-agent Proximal Policy Optimization via Non-fixed Value Clipping

被引：1

作者：

Liu, Chiqiang ^{[1
]}

Li, Dazi ^{[1
]}

机构：

[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China

来源：

2023 IEEE 12TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE, DDCLS | 2023年

关键词：

Multi-agent reinforcement learning; Proximal Policy Optimization; Non-fixed Value Clipping; Noisy value function; LEVEL;

D O I：

10.1109/DDCLS58216.2023.10167264

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the wide application of multi-intelligent reinforcement learning (MARL), its development becomes more and more mature. Multi-agent Proximal Policy Optimization (MAPPO) extended by Proximal Policy Optimization (PPO) algorithm has attracted the attention of researchers with its superior performance. However, the increase in the number of agents in multi-agent cooperation tasks leads to overfitting problems and suboptimal policies due to the fixed clip range that limits the step size of updates. In this paper, MAPPO via Non-fixed Value Clipping (NVC-MAPPO) algorithm is proposed based on MAPPO, and Gaussian noise is introduced in the value function and the clipping function, respectively, and rewriting the clipping function into a form called non-fixed value clipping function. In the end, experiments are conducted on StarCraftII Multi-Agent Challenge (SMAC) to verify that the algorithm can effectively prevent the step size from changing too much while enhancing the exploration ability of the agents, which has improved the performance compared with MAPPO.

引用

页码：1684 / 1688

页数：5

共 18 条

[1]

Claus C, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P746

[2]

Duan Y, 2016, PR MACH LEARN RES, V48

[3]

Foerster JN, 2016, ADV NEUR IN, V29

[4]

Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974

[5]

Haarnoja T, 2018, PR MACH LEARN RES, V80

[6]

Hu J, 2021, Arxiv, DOI arXiv:2106.14334

[7]

Hüttenrauch M, 2019, J MACH LEARN RES, V20

[8]

Li TY, 2019, IEEE INT CONF ROBOT, P263, DOI [10.1109/icra.2019.8793864, 10.1109/ICRA.2019.8793864]

[9] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

[10]

Oliehoek F. A., 2016, CONCISE INTRO DECENT

← 1 2 →