Reinforcement learning-based finite-time tracking control of an unknown unmanned surface vehicle with input constraints

被引：69

作者：

Wang, Ning ^{[1
,3
]}

Gao, Ying ^{[2
]}

Yang, Chen ^{[2
]}

Zhang, Xuefeng ^{[2
]}

机构：

[1] Dalian Maritime Univ, Sch Marine Engn, Dalian 116026, Peoples R China

[2] Dalian Maritime Univ, Sch Marine Elect Engn, Dalian 116026, Peoples R China

[3] Harbin Engn Univ, Sch Shipbldg Engn, Harbin 150001, Peoples R China

来源：

NEUROCOMPUTING | 2022年 / 484卷

关键词：

Reinforcement learning-based finite-time; control; Optimal tracking control; Unknown system dynamics; Input constraints; Unmanned surface vehicle; H-INFINITY CONTROL; NONLINEAR-SYSTEMS; CONTROL DESIGN;

D O I：

10.1016/j.neucom.2021.04.133

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, subject to completely unknown system dynamics and input constraints, a reinforcement learning-based finite-time trajectory tracking control (RLFTC) scheme is innovatively created for an unmanned surface vehicle (USV) by combining actor-critic reinforcement learning (RL) mechanism with finite-time control technique. Unlike previous RL-based tracking which requires infinite-time convergence thereby rather sensitive to complex unknowns, an actor-critic finite-time control structure is created by employing adaptive neural network identifiers to recursively update actor and critic, such that learning-based robustness can be sufficiently enhanced. Moreover, deduced from the Bellman error formulation, the proposed RLFTC is directly optimized in a finite-time manner. Theoretical analysis eventually shows that the proposed RLFTC scheme can ensure semi-global practical finite-time stability (SGPFS) for a closed-loop USV system and tracking errors converge to an arbitrarily small neighborhood of the origin in a finite time, subject to optimal cost. Both mathematical simulation and virtual-reality experiments demonstrate remarkable effectiveness and superiority of the proposed RLFTC scheme. (c) 2021 Elsevier B.V. All rights reserved.

引用

页码：26 / 37

页数：12

共 38 条

[1] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming [J].

Bertsekas, Dimitri P. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :500-509

[2] Adaptive Neural Control of Underactuated Surface Vessels With Prescribed Performance Guarantees [J].

Dai, Shi-Lu ;

He, Shude ;

Wang, Min ;

Yuan, Chengzhi .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (12) :3686-3698

[3] A three-network architecture for on-line learning and optimization based on adaptive dynamic programming [J].

He, Haibo ;

Ni, Zhen ;

Fu, Jian .

NEUROCOMPUTING, 2012, 78 (01) :3-13

[4] Finite-time control for robot manipulators [J].

Hong, YG ;

Xu, YS ;

Huang, J .

SYSTEMS & CONTROL LETTERS, 2002, 46 (04) :243-253

[5] Local Capacity H∞ Control for Production Networks of Autonomous Work Systems With Time-Varying Delays [J].

Karimi, Hamid Reza ;

Duffie, Neil A. ;

Dashkovskiy, Sergey .

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2010, 7 (04) :849-857

[6] Off-Policy Reinforcement Learning: Optimal Operational Control for Two-Time-Scale Industrial Processes [J].

Li, Jinna ;

Kiumarsi, Bahare ;

Chai, Tianyou ;

Lewis, Frank L. ;

Fan, Jialu .

IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (12) :4547-4558

[7] Finite-Time Adaptive Fuzzy Output Feedback Dynamic Surface Control for MIMO Nonstrict Feedback Systems [J].

Li, Yongming ;

Li, Kewen ;

Tong, Shaocheng .

IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2019, 27 (01) :96-110

[8] Event-triggered reinforcement learning H∞ control design for constrained-input nonlinear systems subject to actuator failures [J].

Liang, Yuling ;

Zhang, Huaguang ;

Duan, Jie ;

Sun, Shaoxin .

INFORMATION SCIENCES, 2021, 543 :273-295

[9] Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming [J].

Luo, Biao ;

Yang, Yin ;

Liu, Derong ;

Wu, Huai-Ning .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (01) :76-88

[10] Output Tracking Control Based on Adaptive Dynamic Programming With Multistep Policy Evaluation [J].

Luo, Biao ;

Liu, Derong ;

Huang, Tingwen ;

Liu, Jiangjiang .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2019, 49 (10) :2155-2165

← 1 2 3 4 →