Underactuated MSV path following control via stable adversarial inverse reinforcement learning

被引：2

作者：

Li, Lingyu ^{[1
,2
,3
,4
,5
]}

Ma, Yong ^{[1
,2
,3
,4
,5
]}

Wu, Defeng ^{[6
]}

机构：

[1] Wuhan Univ Technol, State Key Lab Waterway Traff Control & Safety, Wuhan 430063, Hubei, Peoples R China

[2] Wuhan Univ Technol, Sch Nav, Wuhan 430063, Hubei, Peoples R China

[3] Natl Engn Res Ctr Water Transport Safety, Wuhan 430063, Hubei, Peoples R China

[4] Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Hainan, Peoples R China

[5] Wuhan Univ Technol, Chongqing Res Inst, Chongqing 401120, Peoples R China

[6] Jimei Univ, Sch Marine Engn, Xiamen 361021, Fujian, Peoples R China

来源：

OCEAN ENGINEERING | 2024年 / 299卷

基金：

美国国家科学基金会;

关键词：

Underactuated marine surface vehicle; Path-following; Inverse reinforcement learning; Imitation learning; TRACKING CONTROL; USV;

D O I：

10.1016/j.oceaneng.2024.117368

中图分类号：

U6 [水路运输]; P75 [海洋工程];

学科分类号：

0814 ; 081505 ; 0824 ; 082401 ;

摘要：

Model -based control approaches are inadequate to solve the marine surface vehicle (MSV) path-following problem, especially under adverse environments. To effectively deal with the MSV path-following problem, model-free deep reinforcement learning (DRL) based methods have been developed. However, defining an efficient reward function for DRL in path following tasks is rather difficult. Providing expert demonstration is often easier than designing effective reward functions. Thus, we propose a model-free stable adversarial inverse reinforcement learning (SAIRL) algorithm that only adopts the state of MSV and reconstructs the reward function from the expert demonstration. The SAIRL algorithm is designed to guarantee the prescribed MSV path following accuracy and training stability. It utilizes an alternative loss function and dual-discriminator framework to dissolve the issue of policy collapse, which arises due to the vanishing gradient of the discriminator. Simulations and experiments have validated that the SAIRL algorithm outperforms other baseline algorithms in terms of path-following accuracy and stability of convergence.

引用

页数：9

共 50 条

[41] Path following control of underactuated autonomous underwater vehicles
Cui, S.-P., 1600, Editorial Department of Electric Machines and Control (17):
[42] Reinforcement Learning in Discrete Neural Control of the Underactuated System
Hendzel, Zenon
Burghardt, Andrzej
Szuster, Marcin
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2013, 7894 : 64 - 75
[43] Path Following Control for UAV Using Deep Reinforcement Learning Approach
Yintao Zhang
Youmin Zhang
Ziquan Yu
Guidance,Navigation and Control, 2021, (01) : 95 - 112
[44] Predicting Trust in Human Control of Swarms via Inverse Reinforcement Learning
Nam, Changjoo
Walker, Phillip
Lewis, Michael
Sycara, Katia
2017 26TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2017, : 528 - 533
[45] Adaptive generative adversarial maximum entropy inverse reinforcement learning
Song, Li
Li, Dazi
Xu, Xin
INFORMATION SCIENCES, 2025, 695
[46] Online inverse reinforcement learning for nonlinear systems with adversarial attacks
Lian, Bosen
Xue, Wenqian
Lewis, Frank L.
Chai, Tianyou
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2021, 31 (14) : 6646 - 6667
[47] Modeling Driver Behavior using Adversarial Inverse Reinforcement Learning
Sackmann, Moritz
Bey, Henrik
Hofmann, Ulrich
Thielecke, Joern
2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 1683 - 1690
[48] Multi-task Hierarchical Adversarial Inverse Reinforcement Learning
Chen, Jiayu
Tamboli, Dipesh
Lan, Tian
Aggarwal, Vaneet
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[49] InitLight: Initial Model Generation for Traffic Signal Control Using Adversarial Inverse Reinforcement Learning
Ye, Yutong
Zhou, Yingbo
Ding, Jiepin
Wang, Ting
Chen, Mingsong
Lian, Xiang
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4949 - 4958
[50] USV Path-Following Control Based On Deep Reinforcement Learning and Adaptive Control
Gonzalez-Garcia, Alejandro
Castaneda, Herman
Garrido, Leonardo
GLOBAL OCEANS 2020: SINGAPORE - U.S. GULF COAST, 2020,

← 1 2 3 4 5 →