Underactuated MSV path following control via stable adversarial inverse reinforcement learning

被引：2

作者：

Li, Lingyu ^{[1
,2
,3
,4
,5
]}

Ma, Yong ^{[1
,2
,3
,4
,5
]}

Wu, Defeng ^{[6
]}

机构：

[1] Wuhan Univ Technol, State Key Lab Waterway Traff Control & Safety, Wuhan 430063, Hubei, Peoples R China

[2] Wuhan Univ Technol, Sch Nav, Wuhan 430063, Hubei, Peoples R China

[3] Natl Engn Res Ctr Water Transport Safety, Wuhan 430063, Hubei, Peoples R China

[4] Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Hainan, Peoples R China

[5] Wuhan Univ Technol, Chongqing Res Inst, Chongqing 401120, Peoples R China

[6] Jimei Univ, Sch Marine Engn, Xiamen 361021, Fujian, Peoples R China

来源：

OCEAN ENGINEERING | 2024年 / 299卷

基金：

美国国家科学基金会;

关键词：

Underactuated marine surface vehicle; Path-following; Inverse reinforcement learning; Imitation learning; TRACKING CONTROL; USV;

D O I：

10.1016/j.oceaneng.2024.117368

中图分类号：

U6 [水路运输]; P75 [海洋工程];

学科分类号：

0814 ; 081505 ; 0824 ; 082401 ;

摘要：

Model -based control approaches are inadequate to solve the marine surface vehicle (MSV) path-following problem, especially under adverse environments. To effectively deal with the MSV path-following problem, model-free deep reinforcement learning (DRL) based methods have been developed. However, defining an efficient reward function for DRL in path following tasks is rather difficult. Providing expert demonstration is often easier than designing effective reward functions. Thus, we propose a model-free stable adversarial inverse reinforcement learning (SAIRL) algorithm that only adopts the state of MSV and reconstructs the reward function from the expert demonstration. The SAIRL algorithm is designed to guarantee the prescribed MSV path following accuracy and training stability. It utilizes an alternative loss function and dual-discriminator framework to dissolve the issue of policy collapse, which arises due to the vanishing gradient of the discriminator. Simulations and experiments have validated that the SAIRL algorithm outperforms other baseline algorithms in terms of path-following accuracy and stability of convergence.

引用

页数：9

共 50 条

[1] Neural-Network-Based Reinforcement Learning Control for Path Following of Underactuated Ships
Zhang Lixing
Qiao Lei
Chen Jianliang
Zhang Weidong
PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 5786 - 5791
[2] Finite-time output feedback path following control of underactuated MSV based on FTESO
Nie, Jun
Wang, Haixia
Lu, Xiao
Lin, Xiaogong
Sheng, Chunyang
Zhang, Zhiguo
Song, Shibin
OCEAN ENGINEERING, 2021, 224
[3] Three-Dimensional Path Following Control of an Underactuated Robotic Dolphin Using Deep Reinforcement Learning
Liu, Jincun
Liu, Zhenna
Wu, Zhengxing
Yu, Junzhi
2020 IEEE INTERNATIONAL CONFERENCE ON REAL-TIME COMPUTING AND ROBOTICS (IEEE-RCAR 2020), 2020, : 315 - 320
[4] Multi-path Following for Underactuated USV Based on Deep Reinforcement Learning
Wang, Zihao
Wu, Yaoxin
Song, Wen
PROCEEDINGS OF 2022 INTERNATIONAL CONFERENCE ON AUTONOMOUS UNMANNED SYSTEMS, ICAUS 2022, 2023, 1010 : 3525 - 3535
[5] Multiagent Adversarial Inverse Reinforcement Learning
Wei, Ermo
Wicke, Drew
Luke, Sean
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2265 - 2266
[6] Hierarchical Adversarial Inverse Reinforcement Learning
Chen, Jiayu
Lan, Tian
Aggarwal, Vaneet
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) : 17549 - 17558
[7] FAILOS guidance law based adaptive fuzzy finite-time path following control for underactuated MSV
Nie, Jun
Lin, Xiaogong
OCEAN ENGINEERING, 2020, 195 (195)
[8] Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control
Chen, Jiayu
Lan, Tian
Aggarwal, Vaneet
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 5902 - 5908
[9] Straight-Path Following for Underactuated Marine Vessels using Deep Reinforcement Learning
Martinsen, Andreas B.
Lekkas, Anastasios M.
IFAC PAPERSONLINE, 2018, 51 (29): : 329 - 334
[10] Robust Formation Control for Cooperative Underactuated Quadrotors via Reinforcement Learning
Zhao, Wanbing
Liu, Hao
Lewis, Frank L.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4577 - 4587

← 1 2 3 4 5 →