Underactuated MSV path following control via stable adversarial inverse reinforcement learning

被引:2
|
作者
Li, Lingyu [1 ,2 ,3 ,4 ,5 ]
Ma, Yong [1 ,2 ,3 ,4 ,5 ]
Wu, Defeng [6 ]
机构
[1] Wuhan Univ Technol, State Key Lab Waterway Traff Control & Safety, Wuhan 430063, Hubei, Peoples R China
[2] Wuhan Univ Technol, Sch Nav, Wuhan 430063, Hubei, Peoples R China
[3] Natl Engn Res Ctr Water Transport Safety, Wuhan 430063, Hubei, Peoples R China
[4] Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Hainan, Peoples R China
[5] Wuhan Univ Technol, Chongqing Res Inst, Chongqing 401120, Peoples R China
[6] Jimei Univ, Sch Marine Engn, Xiamen 361021, Fujian, Peoples R China
基金
美国国家科学基金会;
关键词
Underactuated marine surface vehicle; Path-following; Inverse reinforcement learning; Imitation learning; TRACKING CONTROL; USV;
D O I
10.1016/j.oceaneng.2024.117368
中图分类号
U6 [水路运输]; P75 [海洋工程];
学科分类号
0814 ; 081505 ; 0824 ; 082401 ;
摘要
Model -based control approaches are inadequate to solve the marine surface vehicle (MSV) path-following problem, especially under adverse environments. To effectively deal with the MSV path-following problem, model-free deep reinforcement learning (DRL) based methods have been developed. However, defining an efficient reward function for DRL in path following tasks is rather difficult. Providing expert demonstration is often easier than designing effective reward functions. Thus, we propose a model-free stable adversarial inverse reinforcement learning (SAIRL) algorithm that only adopts the state of MSV and reconstructs the reward function from the expert demonstration. The SAIRL algorithm is designed to guarantee the prescribed MSV path following accuracy and training stability. It utilizes an alternative loss function and dual-discriminator framework to dissolve the issue of policy collapse, which arises due to the vanishing gradient of the discriminator. Simulations and experiments have validated that the SAIRL algorithm outperforms other baseline algorithms in terms of path-following accuracy and stability of convergence.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Neural-Network-Based Reinforcement Learning Control for Path Following of Underactuated Ships
    Zhang Lixing
    Qiao Lei
    Chen Jianliang
    Zhang Weidong
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 5786 - 5791
  • [2] Finite-time output feedback path following control of underactuated MSV based on FTESO
    Nie, Jun
    Wang, Haixia
    Lu, Xiao
    Lin, Xiaogong
    Sheng, Chunyang
    Zhang, Zhiguo
    Song, Shibin
    OCEAN ENGINEERING, 2021, 224
  • [3] Three-Dimensional Path Following Control of an Underactuated Robotic Dolphin Using Deep Reinforcement Learning
    Liu, Jincun
    Liu, Zhenna
    Wu, Zhengxing
    Yu, Junzhi
    2020 IEEE INTERNATIONAL CONFERENCE ON REAL-TIME COMPUTING AND ROBOTICS (IEEE-RCAR 2020), 2020, : 315 - 320
  • [4] Multi-path Following for Underactuated USV Based on Deep Reinforcement Learning
    Wang, Zihao
    Wu, Yaoxin
    Song, Wen
    PROCEEDINGS OF 2022 INTERNATIONAL CONFERENCE ON AUTONOMOUS UNMANNED SYSTEMS, ICAUS 2022, 2023, 1010 : 3525 - 3535
  • [5] Multiagent Adversarial Inverse Reinforcement Learning
    Wei, Ermo
    Wicke, Drew
    Luke, Sean
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2265 - 2266
  • [6] Hierarchical Adversarial Inverse Reinforcement Learning
    Chen, Jiayu
    Lan, Tian
    Aggarwal, Vaneet
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) : 17549 - 17558
  • [7] FAILOS guidance law based adaptive fuzzy finite-time path following control for underactuated MSV
    Nie, Jun
    Lin, Xiaogong
    OCEAN ENGINEERING, 2020, 195 (195)
  • [8] Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control
    Chen, Jiayu
    Lan, Tian
    Aggarwal, Vaneet
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 5902 - 5908
  • [9] Straight-Path Following for Underactuated Marine Vessels using Deep Reinforcement Learning
    Martinsen, Andreas B.
    Lekkas, Anastasios M.
    IFAC PAPERSONLINE, 2018, 51 (29): : 329 - 334
  • [10] Robust Formation Control for Cooperative Underactuated Quadrotors via Reinforcement Learning
    Zhao, Wanbing
    Liu, Hao
    Lewis, Frank L.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4577 - 4587