Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework

被引:30
作者
Zhan, Guang [1 ]
Zhang, Xinmiao [2 ]
Li, Zhongchao [3 ]
Xu, Lin [2 ]
Zhou, Deyun [1 ]
Yang, Zhen [1 ]
机构
[1] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710072, Peoples R China
[2] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110004, Peoples R China
[3] Aviat Ind Corp China, Shenyang Aircraft Design & Res Inst, Shenyang 110035, Peoples R China
关键词
multiple UAVs; deep reinforcement learning; PPO; curriculum learning; Ray; NAVIGATION;
D O I
10.3390/drones6070166
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Distributed multi-agent collaborative decision-making technology is the key to general artificial intelligence. This paper takes the self-developed Unity3D collaborative combat environment as the test scenario, setting a task that requires heterogeneous unmanned aerial vehicles (UAVs) to perform a distributed decision-making and complete cooperation task. Aiming at the problem of the traditional proximal policy optimization (PPO) algorithm's poor performance in the field of complex multi-agent collaboration scenarios based on the distributed training framework Ray, the Critic network in the PPO algorithm is improved to learn a centralized value function, and the muti-agent proximal policy optimization (MAPPO) algorithm is proposed. At the same time, the inheritance training method based on course learning is adopted to improve the generalization performance of the algorithm. In the experiment, MAPPO can obtain the highest average accumulate reward compared with other algorithms and can complete the task goal with the fewest steps after convergence, which fully demonstrates that the MAPPO algorithm outperforms the state-of-the-art.
引用
收藏
页数:13
相关论文
共 27 条
  • [1] Boundary-aware vehicle tracking upon UAV
    Han, Yuqi
    Wang, Hongshuo
    Zhang, Zengshuo
    Wang, Wenzheng
    [J]. ELECTRONICS LETTERS, 2020, 56 (17) : 873 - 875
  • [2] In Situ MIMO-WPT Recharging of UAVs Using Intelligent Flying Energy Sources
    Hoseini, Sayed Amir
    Hassan, Jahan
    Bokani, Ayub
    Kanhere, Salil S.
    [J]. DRONES, 2021, 5 (03)
  • [3] A biologically-inspired reinforcement learning based intelligent distributed flocking control for Multi-Agent Systems in presence of uncertain system and dynamic environment
    Jafari, Mohammad
    Xu, Hao
    Carrillo, Luis Rodolfo Garcia
    [J]. IFAC JOURNAL OF SYSTEMS AND CONTROL, 2020, 13
  • [4] Multi-agent deep reinforcement learning with type-based hierarchical group communication
    Jiang, Hao
    Shi, Dianxi
    Xue, Chao
    Wang, Yajie
    Wang, Gongju
    Zhang, Yongjun
    [J]. APPLIED INTELLIGENCE, 2021, 51 (08) : 5793 - 5808
  • [5] Kulkarni TD, 2016, ADV NEUR IN, V29
  • [6] Research on multi-UAV task decision-making based on improved MADDPG algorithm and transfer learning
    Li, Bo
    Liang, Shiyang
    Gan, Zhigang
    Chen, Daqing
    Gao, Peixin
    [J]. INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2021, 18 (02) : 82 - 91
  • [7] Liang E, 2018, PR MACH LEARN RES, V80
  • [8] Littman ML, 1994, P 11 INT C MACH LEAR, P157, DOI DOI 10.1016/B978-1-55860-335-6.50027-1
  • [9] Heterogeneous formation control of multiple rotorcrafts with unknown dynamics by reinforcement learning
    Liu, Hao
    Peng, Fachun
    Modares, Hamidreza
    Kiumarsi, Bahare
    [J]. INFORMATION SCIENCES, 2021, 558 : 194 - 207
  • [10] Lowe R, 2017, ADV NEUR IN, V30