Path Planning for Multi-UAV Based on Improved Proximal Policy Optimization Algorithm

被引：0

作者：

Zhu, Wenya ^{[1
]}

Fang, Wenxing ^{[2
,3
]}

Su, Yanxu ^{[1
,4
]}

机构：

[1] Anhui Univ, Sch Artificial Intelligence, Hefei, Peoples R China

[2] Anhui Univ, Inst Phys Sci, Hefei, Peoples R China

[3] Anhui Univ, Inst Informat Technol, Hefei, Peoples R China

[4] Minist Educ, Engn Res Ctr Autonomous Unmanned Syst Technol, Hefei, Peoples R China

来源：

39TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION, YAC 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

reinforcement learning; proximal policy optimization; UAV; path planning;

D O I：

10.1109/YAC63405.2024.10598516

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper explores the application of reinforcement learning in multiple unmanned aerial vehicle (multi-UAV) path planning. The traditional proximal policy optimization (PPO) algorithm faces issues with low efficiency and unstable performance. We introduce a refined version of PPO called RB-PPO (Proximal Policy Optimization with Replay Buffer). The RB-PPO uses off-policy data stored in a replay buffer to enhance the sample efficiency of PPO. Furthermore, it incorporates rollback operations into the objective function to constrain the difference between new and old policies, making policy updates more stable. The RB-PPO combines the stability advantage of on-policy algorithms with the efficient sampling of off-policy algorithms. The experiment results indicate that the RB-PPO achieves quicker convergence and better training rewards compared to the PPO.

引用

页码：1895 / 1899

页数：5

共 20 条

[1] Improved Dijkstra Algorithm for Mobile Robot Path Planning and Obstacle Avoidance [J].

Alshammrei, Shaher ;

Boubaker, Sahbi ;

Kolsi, Lioua .

CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 72 (03) :5939-5954

[2]

Chen JC, 2020, PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), P958, DOI [10.1109/ITNEC48623.2020.9084806, 10.1109/itnec48623.2020.9084806]

[3]

Cobbe K, 2021, PR MACH LEARN RES, V139

[4]

Espeholt L, 2018, PR MACH LEARN RES, V80

[5] An improved A* algorithm for the industrial robot path planning with high success rate and short length [J].

Fu, Bing ;

Chen, Lin ;

Zhou, Yuntao ;

Zheng, Dong ;

Wei, Zhiqi ;

Dai, Jun ;

Pan, Haihong .

ROBOTICS AND AUTONOMOUS SYSTEMS, 2018, 106 :26-37

[6] Deep reinforcement learning in smart manufacturing: A review and prospects [J].

Li, Chengxi ;

Zheng, Pai ;

Yin, Yue ;

Wang, Baicun ;

Wang, Lihui .

CIRP JOURNAL OF MANUFACTURING SCIENCE AND TECHNOLOGY, 2023, 40 :75-101

[7]

Li Guangxing, 2023, Proceedings of SPIE, DOI 10.1117/12.2678893

[8] Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN-LSTM fusion network [J].

Liang, Chengqing ;

Liu, Lei ;

Liu, Chen .

NEURAL NETWORKS, 2023, 162 :21-33

[9]

Liang Xingxing, 2021, arXiv

[10]

Meng WJ, 2023, AAAI CONF ARTIF INTE, P9162

← 1 2 →