Underactuated MSV path following control via stable adversarial inverse reinforcement learning

被引：2

作者：

Li, Lingyu ^{[1
,2
,3
,4
,5
]}

Ma, Yong ^{[1
,2
,3
,4
,5
]}

Wu, Defeng ^{[6
]}

机构：

[1] Wuhan Univ Technol, State Key Lab Waterway Traff Control & Safety, Wuhan 430063, Hubei, Peoples R China

[2] Wuhan Univ Technol, Sch Nav, Wuhan 430063, Hubei, Peoples R China

[3] Natl Engn Res Ctr Water Transport Safety, Wuhan 430063, Hubei, Peoples R China

[4] Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Hainan, Peoples R China

[5] Wuhan Univ Technol, Chongqing Res Inst, Chongqing 401120, Peoples R China

[6] Jimei Univ, Sch Marine Engn, Xiamen 361021, Fujian, Peoples R China

来源：

OCEAN ENGINEERING | 2024年 / 299卷

基金：

美国国家科学基金会;

关键词：

Underactuated marine surface vehicle; Path-following; Inverse reinforcement learning; Imitation learning; TRACKING CONTROL; USV;

D O I：

10.1016/j.oceaneng.2024.117368

中图分类号：

U6 [水路运输]; P75 [海洋工程];

学科分类号：

0814 ; 081505 ; 0824 ; 082401 ;

摘要：

Model -based control approaches are inadequate to solve the marine surface vehicle (MSV) path-following problem, especially under adverse environments. To effectively deal with the MSV path-following problem, model-free deep reinforcement learning (DRL) based methods have been developed. However, defining an efficient reward function for DRL in path following tasks is rather difficult. Providing expert demonstration is often easier than designing effective reward functions. Thus, we propose a model-free stable adversarial inverse reinforcement learning (SAIRL) algorithm that only adopts the state of MSV and reconstructs the reward function from the expert demonstration. The SAIRL algorithm is designed to guarantee the prescribed MSV path following accuracy and training stability. It utilizes an alternative loss function and dual-discriminator framework to dissolve the issue of policy collapse, which arises due to the vanishing gradient of the discriminator. Simulations and experiments have validated that the SAIRL algorithm outperforms other baseline algorithms in terms of path-following accuracy and stability of convergence.

引用

页数：9

共 50 条

[21] Learning Aircraft Pilot Skills by Adversarial Inverse Reinforcement Learning
Suzuki, Kaito
Uemura, Tsuneharu
Tsuchiya, Takeshi
Beppu, Hirofumi
Hazui, Yusuke
Ono, Hitoi
2023 ASIA-PACIFIC INTERNATIONAL SYMPOSIUM ON AEROSPACE TECHNOLOGY, VOL I, APISAT 2023, 2024, 1050 : 1431 - 1441
[22] Reinforcement Learning Algorithm for Path Following Control of Articulated Vehicle
Shao J.
Zhao X.
Yang J.
Zhang W.
Kang Y.
Zhao X.
Yang, Jue (yangjue@ustb.edu.cn), 2017, Chinese Society of Agricultural Machinery (48): : 376 - 382
[23] Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes
Tesfazgi, Samuel
Sprandl, Leonhard
Lederer, Armin
Hirche, Sandra
IEEE OPEN JOURNAL OF CONTROL SYSTEMS, 2024, 3 : 358 - 374
[24] Decision Making for Driving Agent in Traffic Simulation via Adversarial Inverse Reinforcement Learning
Zhong, Naiting
Chen, Junyi
Ma, Yining
Jiang, Wei
2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 2295 - 2301
[25] Adversarial deep reinforcement learning based robust depth tracking control for underactuated autonomous underwater vehicle
Wang, Zhao
Xiang, Xianbo
Duan, Yu
Yang, Shaolong
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 130
[26] Multi-Agent Adversarial Inverse Reinforcement Learning
Yu, Lantao
Song, Jiaming
Ermon, Stefano
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[27] Adaptive Dynamic Surface Control for Cooperative Path Following of Underactuated Marine Surface Vehicles via Low Frequency Learning
Wang Hao
Wang Dan
Peng Zhouhua
Wang Wei
2013 32ND CHINESE CONTROL CONFERENCE (CCC), 2013, : 556 - 561
[28] Data-based Formation Control for Underactuated Quadrotor Team via Reinforcement Learning
Li, Hao
Zhao, Wanbing
Lewis, Frank L.
Jiang, Zhong-Ping
Modares, Hamidreza
PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 6816 - 6821
[29] Data-driven path-following control of underactuated ships based on antenna mutation beetle swarm predictive reinforcement learning
Wang, Le
Li, Shijie
Liu, Jialun
Wu, Qing
APPLIED OCEAN RESEARCH, 2022, 124
[30] Inverse-Inverse Reinforcement Learning. How to Hide Strategy from an Adversarial Inverse Reinforcement Learner
Pattanayak, Kunal
Krishnamurthy, Vikram
Berry, Christopher
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 3631 - 3636

← 1 2 3 4 5 →