UAV Swarm Cooperative Dynamic Target Search: A MAPPO-Based Discrete Optimal Control Method

被引：6

作者：

Wei, Dexing ^{[1
]}

Zhang, Lun ^{[1
]}

Liu, Quan ^{[1
]}

Chen, Hao ^{[1
]}

Huang, Jian ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha 410073, Peoples R China

来源：

DRONES | 2024年 / 8卷 / 06期

关键词：

UAVs; optimal control; dynamic target search; multi-agents; MAPPO;

D O I：

10.3390/drones8060214

中图分类号：

TP7 [遥感技术];

学科分类号：

081102 ; 0816 ; 081602 ; 083002 ; 1404 ;

摘要：

Unmanned aerial vehicles (UAVs) are commonly employed in pursuit and rescue missions, where the target's trajectory is unknown. Traditional methods, such as evolutionary algorithms and ant colony optimization, can generate a search route in a given scenario. However, when the scene changes, the solution needs to be recalculated. In contrast, more advanced deep reinforcement learning methods can train an agent that can be directly applied to a similar task without recalculation. Nevertheless, there are several challenges when the agent learns how to search for unknown dynamic targets. In this search task, the rewards are random and sparse, which makes learning difficult. In addition, because of the need for the agent to adapt to various scenario settings, interactions required between the agent and the environment are more comparable to typical reinforcement learning tasks. These challenges increase the difficulty of training agents. To address these issues, we propose the OC-MAPPO method, which combines optimal control (OC) and Multi-Agent Proximal Policy Optimization (MAPPO) with GPU parallelization. The optimal control model provides the agent with continuous and stable rewards. Through parallelized models, the agent can interact with the environment and collect data more rapidly. Experimental results demonstrate that the proposed method can help the agent learn faster, and the algorithm demonstrated a 26.97% increase in the success rate compared to genetic algorithms.

引用

页数：20

共 38 条

[1]

Andrychowicz Marcin, 2017, Advances in neural information processing systems, V30

[2]

[Anonymous], 2010, P AIAA GUID NAV CONT

[3]

Cao L, 2014, 2014 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS IEEE-ROBIO 2014, P2368, DOI 10.1109/ROBIO.2014.7090692

[4] Adaptive Search Control Applied to Search and Rescue Operations Using Unmanned Aerial Vehicles (UAVs) [J].

Chaves, A. N. ;

Cugnasca, P. S. ;

Neto, J. J. .

IEEE LATIN AMERICA TRANSACTIONS, 2014, 12 (07) :1278-1283

[5] Review of agricultural spraying technologies for plant protection using unmanned aerial vehicle (UAV) [J].

Chen, Haibo ;

Lan, Yubin ;

Fritz, Bradley K. ;

Hoffmann, W. Clint ;

Liu, Shengbo .

INTERNATIONAL JOURNAL OF AGRICULTURAL AND BIOLOGICAL ENGINEERING, 2021, 14 (01) :38-49

[6] Hierarchical Task Assignment Strategy for Heterogeneous Multi-UAV System in Large-Scale Search and Rescue Scenarios [J].

Chen, Jie ;

Xiao, Kai ;

You, Kai ;

Qing, Xianguo ;

Ye, Fang ;

Sun, Qian .

INTERNATIONAL JOURNAL OF AEROSPACE ENGINEERING, 2021, 2021

[7] UAV trajectory planning based on bi-directional APF-RRT* algorithm with goal-biased [J].

Fan, Jiaming ;

Chen, Xia ;

Liang, Xiao .

EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213

[8] UAV Swarm Search Path Planning Method Based on Probability of Containment [J].

Fan, Xiangyu ;

Li, Hao ;

Chen, You ;

Dong, Danna .

DRONES, 2024, 8 (04)

[9] A novel hybrid particle swarm optimization for multi-UAV cooperate path planning [J].

He, Wenjian ;

Qi, Xiaogang ;

Liu, Lifang .

APPLIED INTELLIGENCE, 2021, 51 (10) :7350-7364

[10] UAV Swarm Cooperative Target Search: A Multi-Agent Reinforcement Learning Approach [J].

Hou, Yukai ;

Zhao, Jin ;

Zhang, Rongqing ;

Cheng, Xiang ;

Yang, Liuqing .

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01) :568-578

← 1 2 3 4 →