An Optimized Path Planning Method for Coastal Ships Based on Improved DDPG and DP

被引：30

作者：

Du, Yiquan ^{[1
]}

Zhang, Xiuguo ^{[1
]}

Cao, Zhiying ^{[1
]}

Wang, Shaobo ^{[2
]}

Liang, Jiacheng ^{[1
]}

Zhang, Fengge ^{[1
]}

Tang, Jiawei ^{[1
]}

机构：

[1] Dalian Maritime Univ, Sch Informat Sci & Technol, Dalian 116026, Peoples R China

[2] Dalian Maritime Univ, Sch Nav, Dalian 116026, Peoples R China

来源：

JOURNAL OF ADVANCED TRANSPORTATION | 2021年 / 2021卷

基金：

国家重点研发计划;

关键词：

COLLISION-AVOIDANCE; REINFORCEMENT; MODEL;

D O I：

10.1155/2021/7765130

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Deep Reinforcement Learning (DRL) is widely used in path planning with its powerful neural network fitting ability and learning ability. However, existing DRL-based methods use discrete action space and do not consider the impact of historical state information, resulting in the algorithm not being able to learn the optimal strategy to plan the path, and the planned path has arcs or too many corners, which does not meet the actual sailing requirements of the ship. In this paper, an optimized path planning method for coastal ships based on improved Deep Deterministic Policy Gradient (DDPG) and Douglas-Peucker (DP) algorithm is proposed. Firstly, Long Short-Term Memory (LSTM) is used to improve the network structure of DDPG, which uses the historical state information to approximate the current environmental state information, so that the predicted action is more accurate. On the other hand, the traditional reward function of DDPG may lead to low learning efficiency and convergence speed of the model. Hence, this paper improves the reward principle of traditional DDPG through the mainline reward function and auxiliary reward function, which not only helps to plan a better path for ship but also improves the convergence speed of the model. Secondly, aiming at the problem that too many turning points exist in the above-planned path which may increase the navigation risk, an improved DP algorithm is proposed to further optimize the planned path to make the final path more safe and economical. Finally, simulation experiments are carried out to verify the proposed method from the aspects of plan planning effect and convergence trend. Results show that the proposed method can plan safe and economic navigation paths and has good stability and convergence.

引用

页数：23

共 43 条

[1]

[Anonymous], Deep Recurrent Q-Learning for Partially Observable MDPs

[2] Reinforcement Learning Based Obstacle Avoidance for Autonomous Underwater Vehicle [J].

Bhopale, Prashant ;

Kazi, Faruk ;

Singh, Navdeep .

JOURNAL OF MARINE SCIENCE AND APPLICATION, 2019, 18 (02) :228-238

[3] Target Search Control of AUV in Underwater Environment With Deep Reinforcement Learning [J].

Cao, Xiang ;

Sun, Changyin ;

Yan, Mingzhong .

IEEE ACCESS, 2019, 7 :96549-96559

[4] A knowledge-free path planning approach for smart ships based on reinforcement learning [J].

Chen, Chen ;

Chen, Xian-Qiao ;

Ma, Feng ;

Zeng, Xiao-Jun ;

Wang, Jin .

OCEAN ENGINEERING, 2019, 189

[5] Global path planning for autonomous ship: A hybrid approach of Fast Marching Square and velocity obstacles methods [J].

Chen, Pengfei ;

Huang, Yamin ;

Papadimitriou, Eleonora ;

Mou, Junmin ;

van Gelder, Pieter .

OCEAN ENGINEERING, 2020, 214

[6]

Ding FG, 2018, OCEANS 2018 MTS/IEEE CHARLESTON

[7]

Douglas D.H., 1973, CARTOGRAPHICA, V10, P112, DOI [10.3138/FM57-6770-U75U-7727, DOI 10.3138/FM57-6770-U75U-7727]

[8] Improved Artificial Potential Field Method Applied for AUV Path Planning [J].

Fan, Xiaojing ;

Guo, Yinjing ;

Liu, Hui ;

Wei, Bowen ;

Lyu, Wenhong .

MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020

[9]

Gao F., 2020, APPL RES COMPUTERS, V37

[10]

Guangyu Xiong, 2020, 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), P229, DOI 10.1109/ICIEA48937.2020.9248372

← 1 2 3 4 5 →