Attention-Based Meta-Reinforcement Learning for Tracking Control of AUV With Time-Varying Dynamics

被引：43

作者：

Jiang, Peng ^{[1
]}

Song, Shiji ^{[1
]}

Huang, Gao ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2022年 / 33卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Vehicle dynamics; Task analysis; Trajectory tracking; Trajectory; Tracking; Heuristic algorithms; Adaptation models; Attention mechanism; meta-reinforcement learning (meta-RL); time-varying dynamics; trajectory tracking; TRAJECTORY TRACKING;

D O I：

10.1109/TNNLS.2021.3079148

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) is a promising technique for designing a model-free controller by interacting with the environment. Several researchers have applied RL to autonomous underwater vehicles (AUVs) for motion control, such as trajectory tracking. However, the existing RL-based controller usually assumes that the unknown AUV dynamics keep invariant during the operation period, limiting its further application in the complex underwater environment. In this article, a novel meta-RL-based control scheme is proposed for trajectory tracking control of AUV in the presence of unknown and time-varying dynamics. To this end, we divide the tracking task for AUV with time-varying dynamics into multiple specific tasks with fixed time-varying dynamics, to which we apply meta-RL for training to distill the general control policy. The obtained control policy can transfer to the testing phase with high adaptability. Inspired by the line-of-sight (LOS) tracking rule, we formulate each specific task as a Markov decision process (MDP) with a well-designed state and reward function. Furthermore, a novel policy network with an attention module is proposed to extract the hidden information of AUV dynamics. The simulation environment with time-varying dynamics is established, and the simulation results reveal the effectiveness of our proposed method.

引用

页码：6388 / 6401

页数：14

共 47 条

[1]

Allgower F., 2012, Nonlinear Model Predictive Control, V26

[2]

[Anonymous], 2018, INT J ARTIF INTELL

[3]

[Anonymous], 2015, P MTS IEEE OCEANS

[4]

[Anonymous], 2010, P ADV NEUR INF PROC

[5] Design of robust control based on linear matrix inequality and a novel hybrid PSO search technique for autonomous underwater vehicle [J].

Bejarbaneh, Elham Yazdani ;

Masoumnezhad, Mojtaba ;

Armaghani, Danial Jahed ;

Binh Thai Pham .

APPLIED OCEAN RESEARCH, 2020, 101

[6]

Bretschneider T.R., 2014, Proceedings of the 35th asian conference on remote sensing, nay pyi taw, P27

[7] Reinforcement learning for control: Performance, stability, and deep approximators [J].

Busoniu, Lucian ;

de Bruin, Tim ;

Tolic, Domagoj ;

Kober, Jens ;

Palunko, Ivana .

ANNUAL REVIEWS IN CONTROL, 2018, 46 :8-28

[8]

Crassidis JL, 2000, J ASTRONAUT SCI, V48, P391

[9] Adaptive Neural Network Control of AUVs With Control Input Nonlinearities Using Reinforcement Learning [J].

Cui, Rongxin ;

Yang, Chenguang ;

Li, Yang ;

Sharma, Sanjay .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2017, 47 (06) :1019-1029

[10]

Faulwasser T, 2011, IEEE DECIS CONTR P, P3381, DOI 10.1109/CDC.2011.6160492

← 1 2 3 4 5 →