Robust Multi-Agent Reinforcement Learning Method Based on Adversarial Domain Randomization for Real-World Dual-UAV Cooperation

被引:13
作者
Chen, Shutong [1 ]
Liu, Guanjun [1 ]
Zhou, Ziyuan [1 ]
Zhang, Kaiwen [1 ]
Wang, Jiacun [2 ]
机构
[1] Tongji Univ, Dept Comp Sci, Shanghai 201804, Peoples R China
[2] Monmouth Univ, Dept Comp Sci & Software Engn, West Long Branch, NJ 07764 USA
来源
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES | 2024年 / 9卷 / 01期
关键词
Task analysis; Training; Markov processes; Games; Autonomous aerial vehicles; Reinforcement learning; Transportation; Multi-agent reinforcement learning; sim2real transfer; adversarial domain randomization; prioritized experience replay; dual-UAV cooperative transportation;
D O I
10.1109/TIV.2023.3307134
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A control system of multiple unmanned aerial vehicles (multi-UAV) is generally very complex when they complete a task in a closely-cooperative manner, e.g. two UAVs cooperatively transport a package of goods. Multi-agent reinforcement learning (MARL) offers a promising solution for such a complex control. However, MARL heavily relies on trial-and-error explorations, facing a big challenge in gathering real-world training data. Simulation environments are commonly used to overcome this challenge, i.e., a control policy is trained in a simulation environment and then transferred into a real-world system. But there often exists a gap between simulation and reality and thus a successful transfer is not guaranteed easily. The domain randomization method provides a workable way to bridge this gap. Nevertheless, the traditional one used in a policy training often suffers from slow convergence and results in an unstable decision policy. To address these issues, this article proposes an adversarial domain randomization method. It utilizes an adversarial generator as a "nature player" to generate a more reasonable training environment so that the trained decision policy can deal with complex situations. Additionally, we improve the prioritized experience replay method by which we can sample those critical experiences, increasing the convergence speed of a training without decreasing the performance of the trained policy. We apply our method to a real-world task of dual-UAV cooperative transportation, and experiments illustrate its effectiveness compared to traditional ones.
引用
收藏
页码:1615 / 1627
页数:13
相关论文
共 48 条
[1]   Learning dexterous in-hand manipulation [J].
Andrychowicz, Marcin ;
Baker, Bowen ;
Chociej, Maciek ;
Jozefowicz, Rafal ;
McGrew, Bob ;
Pachocki, Jakub ;
Petron, Arthur ;
Plappert, Matthias ;
Powell, Glenn ;
Ray, Alex ;
Schneider, Jonas ;
Sidor, Szymon ;
Tobin, Josh ;
Welinder, Peter ;
Weng, Lilian ;
Zaremba, Wojciech .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (01) :3-20
[2]  
Brittain M, 2020, Arxiv, DOI arXiv:1905.12726
[3]  
Brockman G, 2016, Arxiv, DOI arXiv:1606.01540
[4]   Transferring Multi-Agent Reinforcement Learning Policies for Autonomous Driving using Sim-to-Real [J].
Candela, Eduardo ;
Parada, Leandro ;
Marques, Luis ;
Georgescu, Tiberiu-Andrei ;
Demiris, Yiannis ;
Angeloudis, Panagiotis .
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, :8814-8820
[5]  
Raparthy SC, 2020, Arxiv, DOI arXiv:2002.07911
[6]   Autonomous Tracking Using a Swarm of UAVs: A Constrained Multi-Agent Reinforcement Learning Approach [J].
Chen, Yu-Jia ;
Chang, Deng-Kai ;
Zhang, Cheng .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (11) :13702-13717
[7]  
Wang RE, 2020, Arxiv, DOI arXiv:2002.06684
[8]  
Fink A., 1964, J SCI HIROSHIMA U SE, V28, P89, DOI DOI 10.32917/HMJ/1206139508
[9]  
Maciel-Pearson BG, 2019, Arxiv, DOI arXiv:1912.05684
[10]   Joint Deployment Optimization and Flight Trajectory Planning for UAV Assisted IoT Data Collection: A Bilevel Optimization Approach [J].
Han, Shoufei ;
Zhu, Kun ;
Zhou, MengChu ;
Liu, Xiaojing .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (11) :21492-21504