Safe multi-agent reinforcement learning for multi-robot control

被引:39
|
作者
Gu, Shangding [1 ,4 ]
Kuba, Jakub Grudzien [2 ]
Chen, Yuanpei [4 ]
Du, Yali [3 ]
Yang, Long [4 ]
Knoll, Alois [1 ]
Yang, Yaodong [4 ]
机构
[1] Tech Univ Munich, Dept Comp Sci, Munich, Germany
[2] Univ Oxford, Dept Stat, Oxford, England
[3] Kings Coll London, Dept Informat, London, England
[4] Peking Univ, Inst Artificial Intelligence, Beijing, Peoples R China
关键词
Constrained Markov game; Constrained policy optimisation; Safe multi-agent benchmarks; Safe multi-robot control; NETWORKS;
D O I
10.1016/j.artint.2023.103905
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A challenging problem in robotics is how to control multiple robots cooperatively and safely in real-world applications. Yet, developing multi-robot control methods from the perspective of safe multi-agent reinforcement learning (MARL) has merely been studied. To fill this gap, in this study, we investigate safe MARL for multi-robot control on cooperative tasks, in which each individual robot has to not only meet its own safety constraints while maximising their reward, but also consider those of others to guarantee safe team behaviours. Firstly, we formulate the safe MARL problem as a constrained Markov game and employ policy optimisation to solve it theoretically. The proposed algorithm guarantees monotonic improvement in reward and satisfaction of safety constraints at every iteration. Secondly, as approximations to the theoretical solution, we propose two safe multi -agent policy gradient methods: Multi-Agent Constrained Policy Optimisation (MACPO) and MAPPO-Lagrangian. Thirdly, we develop the first three safe MARL benchmarks-Safe Multi -Agent MuJoCo (Safe MAMuJoCo), Safe Multi-Agent Robosuite (Safe MARobosuite) and Safe Multi-Agent Isaac Gym (Safe MAIG) to expand the toolkit of MARL and robot control research communities. Finally, experimental results on the three safe MARL benchmarks indicate that our methods can achieve state-of-the-art performance in the balance between improving reward and satisfying safety constraints compared with strong baselines. Demos and code are available at the link (https://sites .google .com /view /aij -safe -marl/).2Crown Copyright (c) 2023 Published by Elsevier B.V. All rights reserved.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Reinforcement Learning in the Multi-Robot Domain
    Maja J. Matarić
    Autonomous Robots, 1997, 4 : 73 - 83
  • [22] Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning
    El Mhamdi, El Mandi
    Guerraoui, Rachid
    Hendrikx, Hadrien
    Maurer, Alexandre
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [23] Safe Multi-Agent Reinforcement Learning via Dynamic Shielding
    Qiu, Yunbo
    Jin, Yue
    Yu, Lebin
    Wang, Jian
    Zhang, Xudong
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 1254 - 1257
  • [24] Multi-Agent Reinforcement Learning
    Stankovic, Milos
    2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43
  • [25] PD-FAC: Probability Density Factorized Multi-Agent Distributional Reinforcement Learning for Multi-Robot Reliable Search
    Sheng, Wenda
    Guo, Hongliang
    Yau, Wei-Yun
    Zhou, Yingjie
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04): : 8869 - 8876
  • [26] Efficient Safe Control via Deep Reinforcement Learning and Supervisory Control - Case Study on Multi-Robot
    Konishi, Masahiro
    Sasaki, Tomotake
    Cai, Kai
    IFAC PAPERSONLINE, 2022, 55 (28): : 16 - 21
  • [27] Study of reinforcement learning based on multi-agent robot systems
    College of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China
    J. Comput. Inf. Syst., 2007, 5 (2001-2006): : 2001 - 2006
  • [28] Multi-robot Formation Control Using Reinforcement Learning Method
    Zuo, Guoyu
    Han, Jiatong
    Han, Guansheng
    ADVANCES IN SWARM INTELLIGENCE, PT 1, PROCEEDINGS, 2010, 6145 : 667 - 674
  • [29] Multi-Robot Flocking Control Based on Deep Reinforcement Learning
    Zhu, Pengming
    Dai, Wei
    Yao, Weijia
    Ma, Junchong
    Zeng, Zhiwen
    Lu, Huimin
    IEEE ACCESS, 2020, 8 : 150397 - 150406
  • [30] Multi-Agent Reinforcement Learning Control for Ramp Metering
    Fares, Ahmed
    Gomaa, Walid
    PROGRESS IN SYSTEMS ENGINEERING, 2015, 366 : 167 - 173