Attention-Based Meta-Reinforcement Learning for Tracking Control of AUV With Time-Varying Dynamics

被引:37
作者
Jiang, Peng [1 ]
Song, Shiji [1 ]
Huang, Gao [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Vehicle dynamics; Task analysis; Trajectory tracking; Trajectory; Tracking; Heuristic algorithms; Adaptation models; Attention mechanism; meta-reinforcement learning (meta-RL); time-varying dynamics; trajectory tracking; TRAJECTORY TRACKING;
D O I
10.1109/TNNLS.2021.3079148
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) is a promising technique for designing a model-free controller by interacting with the environment. Several researchers have applied RL to autonomous underwater vehicles (AUVs) for motion control, such as trajectory tracking. However, the existing RL-based controller usually assumes that the unknown AUV dynamics keep invariant during the operation period, limiting its further application in the complex underwater environment. In this article, a novel meta-RL-based control scheme is proposed for trajectory tracking control of AUV in the presence of unknown and time-varying dynamics. To this end, we divide the tracking task for AUV with time-varying dynamics into multiple specific tasks with fixed time-varying dynamics, to which we apply meta-RL for training to distill the general control policy. The obtained control policy can transfer to the testing phase with high adaptability. Inspired by the line-of-sight (LOS) tracking rule, we formulate each specific task as a Markov decision process (MDP) with a well-designed state and reward function. Furthermore, a novel policy network with an attention module is proposed to extract the hidden information of AUV dynamics. The simulation environment with time-varying dynamics is established, and the simulation results reveal the effectiveness of our proposed method.
引用
收藏
页码:6388 / 6401
页数:14
相关论文
共 47 条
[1]  
Allgower F., 2012, NONLINEAR MODEL PRED, V26
[2]  
[Anonymous], 2015, P MTS IEEE OCEANS
[3]   Design of robust control based on linear matrix inequality and a novel hybrid PSO search technique for autonomous underwater vehicle [J].
Bejarbaneh, Elham Yazdani ;
Masoumnezhad, Mojtaba ;
Armaghani, Danial Jahed ;
Binh Thai Pham .
APPLIED OCEAN RESEARCH, 2020, 101
[4]  
Bretschneider T.R., 2014, Proceedings of the 35th asian conference on remote sensing, nay pyi taw, P27
[5]   Reinforcement learning for control: Performance, stability, and deep approximators [J].
Busoniu, Lucian ;
de Bruin, Tim ;
Tolic, Domagoj ;
Kober, Jens ;
Palunko, Ivana .
ANNUAL REVIEWS IN CONTROL, 2018, 46 :8-28
[6]  
Crassidis JL, 2000, J ASTRONAUT SCI, V48, P391
[7]   Adaptive Neural Network Control of AUVs With Control Input Nonlinearities Using Reinforcement Learning [J].
Cui, Rongxin ;
Yang, Chenguang ;
Li, Yang ;
Sharma, Sanjay .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2017, 47 (06) :1019-1029
[8]  
Faulwasser T, 2011, IEEE DECIS CONTR P, P3381, DOI 10.1109/CDC.2011.6160492
[9]  
Finn C, 2017, PR MACH LEARN RES, V70
[10]  
Fossen T. I., 1994, GUIDANCE CONTROL OCE