Underactuated MSV path following control via stable adversarial inverse reinforcement learning

被引:2
|
作者
Li, Lingyu [1 ,2 ,3 ,4 ,5 ]
Ma, Yong [1 ,2 ,3 ,4 ,5 ]
Wu, Defeng [6 ]
机构
[1] Wuhan Univ Technol, State Key Lab Waterway Traff Control & Safety, Wuhan 430063, Hubei, Peoples R China
[2] Wuhan Univ Technol, Sch Nav, Wuhan 430063, Hubei, Peoples R China
[3] Natl Engn Res Ctr Water Transport Safety, Wuhan 430063, Hubei, Peoples R China
[4] Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Hainan, Peoples R China
[5] Wuhan Univ Technol, Chongqing Res Inst, Chongqing 401120, Peoples R China
[6] Jimei Univ, Sch Marine Engn, Xiamen 361021, Fujian, Peoples R China
基金
美国国家科学基金会;
关键词
Underactuated marine surface vehicle; Path-following; Inverse reinforcement learning; Imitation learning; TRACKING CONTROL; USV;
D O I
10.1016/j.oceaneng.2024.117368
中图分类号
U6 [水路运输]; P75 [海洋工程];
学科分类号
0814 ; 081505 ; 0824 ; 082401 ;
摘要
Model -based control approaches are inadequate to solve the marine surface vehicle (MSV) path-following problem, especially under adverse environments. To effectively deal with the MSV path-following problem, model-free deep reinforcement learning (DRL) based methods have been developed. However, defining an efficient reward function for DRL in path following tasks is rather difficult. Providing expert demonstration is often easier than designing effective reward functions. Thus, we propose a model-free stable adversarial inverse reinforcement learning (SAIRL) algorithm that only adopts the state of MSV and reconstructs the reward function from the expert demonstration. The SAIRL algorithm is designed to guarantee the prescribed MSV path following accuracy and training stability. It utilizes an alternative loss function and dual-discriminator framework to dissolve the issue of policy collapse, which arises due to the vanishing gradient of the discriminator. Simulations and experiments have validated that the SAIRL algorithm outperforms other baseline algorithms in terms of path-following accuracy and stability of convergence.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Learning Aircraft Pilot Skills by Adversarial Inverse Reinforcement Learning
    Suzuki, Kaito
    Uemura, Tsuneharu
    Tsuchiya, Takeshi
    Beppu, Hirofumi
    Hazui, Yusuke
    Ono, Hitoi
    2023 ASIA-PACIFIC INTERNATIONAL SYMPOSIUM ON AEROSPACE TECHNOLOGY, VOL I, APISAT 2023, 2024, 1050 : 1431 - 1441
  • [22] Reinforcement Learning Algorithm for Path Following Control of Articulated Vehicle
    Shao J.
    Zhao X.
    Yang J.
    Zhang W.
    Kang Y.
    Zhao X.
    Yang, Jue (yangjue@ustb.edu.cn), 2017, Chinese Society of Agricultural Machinery (48): : 376 - 382
  • [23] Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes
    Tesfazgi, Samuel
    Sprandl, Leonhard
    Lederer, Armin
    Hirche, Sandra
    IEEE OPEN JOURNAL OF CONTROL SYSTEMS, 2024, 3 : 358 - 374
  • [24] Decision Making for Driving Agent in Traffic Simulation via Adversarial Inverse Reinforcement Learning
    Zhong, Naiting
    Chen, Junyi
    Ma, Yining
    Jiang, Wei
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 2295 - 2301
  • [25] Adversarial deep reinforcement learning based robust depth tracking control for underactuated autonomous underwater vehicle
    Wang, Zhao
    Xiang, Xianbo
    Duan, Yu
    Yang, Shaolong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 130
  • [26] Multi-Agent Adversarial Inverse Reinforcement Learning
    Yu, Lantao
    Song, Jiaming
    Ermon, Stefano
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [27] Adaptive Dynamic Surface Control for Cooperative Path Following of Underactuated Marine Surface Vehicles via Low Frequency Learning
    Wang Hao
    Wang Dan
    Peng Zhouhua
    Wang Wei
    2013 32ND CHINESE CONTROL CONFERENCE (CCC), 2013, : 556 - 561
  • [28] Data-based Formation Control for Underactuated Quadrotor Team via Reinforcement Learning
    Li, Hao
    Zhao, Wanbing
    Lewis, Frank L.
    Jiang, Zhong-Ping
    Modares, Hamidreza
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 6816 - 6821
  • [29] Data-driven path-following control of underactuated ships based on antenna mutation beetle swarm predictive reinforcement learning
    Wang, Le
    Li, Shijie
    Liu, Jialun
    Wu, Qing
    APPLIED OCEAN RESEARCH, 2022, 124
  • [30] Inverse-Inverse Reinforcement Learning. How to Hide Strategy from an Adversarial Inverse Reinforcement Learner
    Pattanayak, Kunal
    Krishnamurthy, Vikram
    Berry, Christopher
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 3631 - 3636