Effective Multi-Agent Deep Reinforcement Learning Control With Relative Entropy Regularization

被引：3

作者：

Miao, Chenyang ^{[1
,2
]}

Cui, Yunduan ^{[2
]}

Li, Huiyun ^{[2
]}

Wu, Xinyu ^{[2
]}

机构：

[1] Univ Chinese Acad Sci, Beijing 101408, Peoples R China

[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Guangdong Hong Kong Macao Joint Lab Human Machine, Shenzhen 518000, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING | 2025年 / 22卷

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning; Multi-Agent Reinforcement Learning (MARL); robot learning; ROBOT;

D O I：

10.1109/TASE.2024.3398712

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper focused on developing an effective Multi-Agent Reinforcement Learning (MARL) approach that quickly explores optimal control policies of multiple agents through interactions with unknown environments. Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in the current MARL approaches. It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Centralized Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure. Evaluated by multi-agent cooperation and competition tasks and traditional control tasks including OpenAI benchmarks and robot arm manipulation, MACDPP demonstrates its significant superiority in learning capability and sample efficiency compared with both related multi-agent and widely implemented signal-agent baselines. It converges to 62% higher average return and uses 38% fewer samples compared with the suboptimal baseline over all tasks, indicating the potential of MARL in challenging control scenarios, especially when the number of interactions is limited. The open source code of MACDPP is available at https://github.com/AdrienLin1/MACDPP. Note to Practitioners - Learning proper cooperation strategy over multiple agents in complicated systems has been a challenge in the domain of Reinforcement Learning. Our work extends the traditional MARL approach FKDPP that has been successfully implemented in the real-world chemical plant by Yokogawa to the CTDE framework and AC structure that supports continuous actions. This extension significantly expands its range of applications from cooperative/competitive tasks to the joint control of one complex system while maintaining its effectiveness.

引用

页码：3704 / 3718

页数：15

共 45 条

[1] Ackermann J., 2019, arXiv
[2] Asadi K, 2017, PR MACH LEARN RES, V70
[3] Azar MG, 2012, J MACH LEARN RES, V13, P3207
[4] Scalable Autonomous Separation Assurance With Heterogeneous Multi-Agent Reinforcement Learning
Brittain, Marc
Wei, Peng
[J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2022, 19 (04) : 2837 - 2848
[5] A comprehensive survey of multiagent reinforcement learning
Busoniu, Lucian
Babuska, Robert
De Schutter, Bart
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02): : 156 - 172
[6] UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios
Chai, Jiajun
Li, Weifan
Zhu, Yuanheng
Zhao, Dongbin
Ma, Zhe
Sun, Kewu
Ding, Jishiyu
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (04) : 2093 - 2104
[7] Cui YD, 2018, IEEE INT CON AUTO SC, P304, DOI 10.1109/COASE.2018.8560593
[8] Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states
Cui, Yunduan
Matsubara, Takamitsu
Sugimoto, Kenji
[J]. NEURAL NETWORKS, 2017, 94 : 13 - 23
[9] Foerster JN, 2016, ADV NEUR IN, V29
[10] Fujimoto S., 2024, P ADV NEUR INF PROC, V36, P1

← 1 2 3 4 5 →