Effective Multi-Agent Deep Reinforcement Learning Control With Relative Entropy Regularization

被引:3
作者
Miao, Chenyang [1 ,2 ]
Cui, Yunduan [2 ]
Li, Huiyun [2 ]
Wu, Xinyu [2 ]
机构
[1] Univ Chinese Acad Sci, Beijing 101408, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Guangdong Hong Kong Macao Joint Lab Human Machine, Shenzhen 518000, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Multi-Agent Reinforcement Learning (MARL); robot learning; ROBOT;
D O I
10.1109/TASE.2024.3398712
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper focused on developing an effective Multi-Agent Reinforcement Learning (MARL) approach that quickly explores optimal control policies of multiple agents through interactions with unknown environments. Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in the current MARL approaches. It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Centralized Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure. Evaluated by multi-agent cooperation and competition tasks and traditional control tasks including OpenAI benchmarks and robot arm manipulation, MACDPP demonstrates its significant superiority in learning capability and sample efficiency compared with both related multi-agent and widely implemented signal-agent baselines. It converges to 62% higher average return and uses 38% fewer samples compared with the suboptimal baseline over all tasks, indicating the potential of MARL in challenging control scenarios, especially when the number of interactions is limited. The open source code of MACDPP is available at https://github.com/AdrienLin1/MACDPP. Note to Practitioners - Learning proper cooperation strategy over multiple agents in complicated systems has been a challenge in the domain of Reinforcement Learning. Our work extends the traditional MARL approach FKDPP that has been successfully implemented in the real-world chemical plant by Yokogawa to the CTDE framework and AC structure that supports continuous actions. This extension significantly expands its range of applications from cooperative/competitive tasks to the joint control of one complex system while maintaining its effectiveness.
引用
收藏
页码:3704 / 3718
页数:15
相关论文
共 45 条
  • [1] Ackermann J., 2019, arXiv
  • [2] Asadi K, 2017, PR MACH LEARN RES, V70
  • [3] Azar MG, 2012, J MACH LEARN RES, V13, P3207
  • [4] Scalable Autonomous Separation Assurance With Heterogeneous Multi-Agent Reinforcement Learning
    Brittain, Marc
    Wei, Peng
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2022, 19 (04) : 2837 - 2848
  • [5] A comprehensive survey of multiagent reinforcement learning
    Busoniu, Lucian
    Babuska, Robert
    De Schutter, Bart
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02): : 156 - 172
  • [6] UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios
    Chai, Jiajun
    Li, Weifan
    Zhu, Yuanheng
    Zhao, Dongbin
    Ma, Zhe
    Sun, Kewu
    Ding, Jishiyu
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (04) : 2093 - 2104
  • [7] Cui YD, 2018, IEEE INT CON AUTO SC, P304, DOI 10.1109/COASE.2018.8560593
  • [8] Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states
    Cui, Yunduan
    Matsubara, Takamitsu
    Sugimoto, Kenji
    [J]. NEURAL NETWORKS, 2017, 94 : 13 - 23
  • [9] Foerster JN, 2016, ADV NEUR IN, V29
  • [10] Fujimoto S., 2024, P ADV NEUR INF PROC, V36, P1