Deep reinforcement learning with intrinsic curiosity module based trajectory tracking control for USV

被引：11

作者：

Wu, Chuanbo ^{[1
]}

Yu, Wanneng ^{[1
,2
,3
]}

Liao, Weiqiang ^{[1
,2
,3
]}

Ou, Yanghangcheng ^{[1
]}

机构：

[1] Jimei Univ, Sch Marine Engn, Xiamen 361021, Peoples R China

[2] Fujian Prov Key Lab Naval Architecture & Ocean Eng, Xiamen 361021, Peoples R China

[3] Fujian Engn & Res Ctr, Offshore Small Green Intelligent Ship Syst, Xiamen 361021, Peoples R China

来源：

OCEAN ENGINEERING | 2024年 / 308卷

基金：

中国国家自然科学基金;

关键词：

Deep reinforcement learning; Intrinsic curiosity module; Trajectory tracking; USV;

D O I：

10.1016/j.oceaneng.2024.118342

中图分类号：

U6 [水路运输]; P75 [海洋工程];

学科分类号：

0814 ; 081505 ; 0824 ; 082401 ;

摘要：

Since unmanned surface vehicle (USV) systems are highly coupled and have nonlinear relationships, coupled with environmental disturbances from winds and currents, this makes it challenging to achieve accurate trajectory tracking of USVs by directly controlling the underlying parameters, such as rudder and rotational speed. Therefore, this paper proposes a proximal policy optimisation (PPO) control scheme based on intrinsic curiosity module (ICM). First, according to the training characteristics of deep reinforcement learning (DRL) algorithms, an improved guidance law is proposed, which can effectively solve the problem of the desired speed exceeding the maximum allowable speed caused by the large tracking error due to the random exploration of the USV at the early stage of training. Different from the traditional DRL methods, this method incorporates intrinsic rewards alongside extrinsic rewards from the training environment. These intrinsic rewards, generated by the intrinsic curiosity module, serve to incentivize the agent. Actively exploring unknown states and acquiring new knowledge can enhance training outcomes and prevent premature model convergence. Finally, tested in designing and constructing multiple tracking scenarios containing both simple and complex trajectories, the simulation results show that the ICM-PPO method performs well in the accurate trajectory tracking problem.

引用

页数：14

共 27 条

[21] A navigation accuracy compensation algorithm for low-cost unmanned surface vehicles based on models and event triggers [J].

Yan, Xin ;

Yang, Xiaofei ;

Feng, Beizhen ;

Liu, Wei ;

Ye, Hui ;

Zhu, Zhiyu ;

Shen, Hao ;

Xiang, Zhengrong .

CONTROL ENGINEERING PRACTICE, 2024, 146

[22] A Joint Ship Detection and Waterway Segmentation Method for Environment-Aware of USVs in Canal Waterways [J].

Yang, Xiaofei ;

She, Hongwei ;

Lou, Mengmeng ;

Ye, Hui ;

Guan, Jun ;

Li, Jianzhen ;

Xiang, Zhengrong ;

Shen, Hao ;

Zhang, Bin .

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 :2132-2144

[23] Introduction of MMG standard method for ship maneuvering predictions [J].

Yasukawa, H. ;

Yoshimura, Y. .

JOURNAL OF MARINE SCIENCE AND TECHNOLOGY, 2015, 20 (01) :37-52

[24] Neuro-adaptive trajectory tracking control of underactuated autonomous surface vehicles with high-gain observer [J].

Zhang, Chengju ;

Wang, Cong ;

Wang, Jinqiang ;

Li, Conghui .

APPLIED OCEAN RESEARCH, 2020, 97

[25] Model-Reference Reinforcement Learning for Collision-Free Tracking Control of Autonomous Surface Vehicles [J].

Zhang, Qingrui ;

Pan, Wei ;

Reppa, Vasso .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) :8770-8781

[26] An Improved Model Predictive Control for Path-Following of USV Based on Global Course Constraint and Event-Triggered Mechanism [J].

Zhao, Baigang ;

Zhang, Xianku ;

Liang, Cailei ;

Han, Xu .

IEEE ACCESS, 2021, 9 :79725-79734

[27] Event-Triggered Approximate Optimal Path-Following Control for Unmanned Surface Vehicles With State Constraints [J].

Zhou, Weixiang ;

Fu, Jun ;

Yan, Huaicheng ;

Du, Xin ;

Wang, Yueying ;

Zhou, Hua .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) :104-118

← 1 2 3 →