Research on Ship Trajectory Control Based on Deep Reinforcement Learning

被引:0
作者
Xu, Lixin [1 ,2 ]
Chen, Jiarong [3 ]
Hong, Zhichao [1 ,2 ]
Xu, Shengqing [3 ]
Zhang, Sheng [4 ]
Shi, Lin [5 ]
机构
[1] Jiangsu Univ Sci & Technol, Ocean Coll, Zhenjiang 212003, Peoples R China
[2] Jiangsu Marine Technol Innovat Ctr, Nantong 226000, Peoples R China
[3] Jiangsu Univ Sci & Technol, Sch Naval Architecture & Ocean Engn, Zhenjiang 212000, Peoples R China
[4] Zhenjiang Yuanli Innovat Technol Co Ltd, Zhenjiang 212008, Peoples R China
[5] China Construct Civil Infrastruct Co Ltd CSC, Beijing 100029, Peoples R China
来源
JOURNAL OF MARINE SCIENCE AND ENGINEERING | 2025年 / 13卷 / 04期
基金
国家重点研发计划;
关键词
trajectory control; deep reinforcement learning; trajectory tracking controller; reward function; COLLISION-AVOIDANCE; PATH;
D O I
10.3390/jmse13040792; 10.3390/jmse13040792
中图分类号
U6 [水路运输]; P75 [海洋工程];
学科分类号
0814 ; 081505 ; 0824 ; 082401 ;
摘要
Ship trajectory tracking controllers based on deep reinforcement learning (DRL) are widely applied in various fields such as autonomous driving and robotics due to their strong adaptive learning capabilities and optimization decision-making ability. However, ship trajectory control faces challenges such as long training cycles and poor convergence performance. These issues are primarily caused by the unreasonable design of algorithm models and reward functions, which limit the performance optimization and energy efficiency improvements in real-world navigation. In this paper, we propose a ship trajectory tracking control algorithm based on deep reinforcement learning. The proposed algorithm introduces maximum entropy theory and experience replay techniques. Additionally, it enhances the reward function module by adding reward terms and fitting weight designs. A three-dimensional simulation environment is constructed to validate the proposed method. The results demonstrate that the controller designed in this study outperforms traditional DRL controllers in terms of fast convergence, convergence stability, and final reward values. The controller meets the requirements for tracking conventional trajectories and shows stable and efficient performance in both wide-area water search experiments and river channel traversal experiments. These experimental results provide valuable insights for future research directions.
引用
收藏
页数:21
相关论文
共 31 条
[1]   Deep Reinforcement Learning A brief survey [J].
Arulkumaran, Kai ;
Deisenroth, Marc Peter ;
Brundage, Miles ;
Bharath, Anil Anthony .
IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38
[2]  
Beard R.W., 2003, Proc. IEEE, V91, P1059
[3]   A Novel Fixed-Time Trajectory Tracking Strategy of Unmanned Surface Vessel Based on the Fractional Sliding Mode Control Method [J].
Chen, Dong ;
Zhang, Jundong ;
Li, Zhongkun .
ELECTRONICS, 2022, 11 (05)
[4]   Intelligent ship route planning via an A* search model enhanced double-deep Q-network [J].
Chen, Xinqiang ;
Hu, Ruiyang ;
Luo, Kai ;
Wu, Huafeng ;
Biancardo, Salvatore Antonio ;
Zheng, Yiwen ;
Xian, Jiangfeng .
OCEAN ENGINEERING, 2025, 327
[5]   Maritime traffic situation awareness analysis via high-fidelity ship imaging trajectory [J].
Chen, Xinqiang ;
Zheng, Jinbiao ;
Li, Chaofeng ;
Wu, Bing ;
Wu, Huafeng ;
Montewka, Jakub .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) :48907-48923
[6]  
Chen Z., 2021, IEEE Trans. Neural Netw. Learn. Syst, V32, P2026
[7]   Autonomous collision avoidance decision-making method for USV based on ATL-TD3 algorithm [J].
Cui, Zhewen ;
Guan, Wei ;
Zhang, Xianku ;
Zhang, Guoqing .
OCEAN ENGINEERING, 2024, 312
[8]   Robust adaptive path following of underactuated ships [J].
Do, KD ;
Jiang, ZP ;
Pan, J .
AUTOMATICA, 2004, 40 (06) :929-944
[9]  
Fossen T.I., 2003, P 6 IFAC C MANEUVERI, P244, DOI [10.1016/S1474-6670(17)37809-6, DOI 10.1016/S1474-6670(17)37809-6]
[10]   Line-of-Sight Path Following for Dubins Paths With Adaptive Sideslip Compensation of Drift Forces [J].
Fossen, Thor I. ;
Pettersen, Kristin Y. ;
Galeazzi, Roberto .
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2015, 23 (02) :820-827