Optimized tracking control using reinforcement learning and backstepping technique for canonical nonlinear unknown dynamic system

被引：1

作者：

Song, Yanfen ^{[1
,2
]}

Li, Zijun ^{[1
,2
]}

Wen, Guoxing ^{[2
,3
]}

机构：

[1] Qilu Univ Technol, Shandong Acad Sci, Sch Math & Stat, Jinan, Peoples R China

[2] Shandong Univ Aeronaut, Coll Sci, Binzhou, Peoples R China

[3] Shandong Univ Aeronaut, Coll Sci, Binzhou 256600, Shandong, Peoples R China

来源：

OPTIMAL CONTROL APPLICATIONS & METHODS | 2024年 / 45卷 / 04期

基金：

中国国家自然科学基金;

关键词：

backstepping; identifier-critic-actor architecture; nonlinear canonical system; optimal control; reinforcement learning; CONTINUOUS-TIME; NEURAL-CONTROL; PERFORMANCE; ALGORITHM;

D O I：

10.1002/oca.3115

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The work addresses the optimized tracking control problem by combining both reinforcement learning (RL) and backstepping technique for the canonical nonlinear unknown dynamic system. Since such dynamic system contains multiple state variables with differential relation, the backstepping technique is considered by making a virtual control sequence in accordance with Lyapunov functions. In the last backstepping step, the optimized actual control is derived by performing the RL under identifier-critic-actor structure, where RL is to overcome the difficulty coming from solving Hamilton-Jacobi-Bellman (HJB) equation. Different from the traditional RL optimizing methods that find the RL updating laws from the square of the HJB equation's approximation, this optimized control is to find the RL training laws from the negative gradient of a simple positive definite function, which is equivalent to the HJB equation. The result shows that this optimized control can obviously alleviate the algorithm complexity. Meanwhile, it can remove the requirement of known dynamic as well. Finally, theory and simulation indicate the feasibility of this optimized control. Executive process of the optimized backstepping control. image

引用

页码：1655 / 1671

页数：17

共 35 条

[1] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].

Al-Tamimi, Asma ;

Lewis, Frank .

2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, :38-+

[2] Improving the performance of stabilizing controls for nonlinear systems [J].

Bear, R ;

Saridis, G ;

Wen, J .

IEEE CONTROL SYSTEMS MAGAZINE, 1996, 16 (05) :27-35

[3] DYNAMIC PROGRAMMING [J].

BELLMAN, R .

SCIENCE, 1966, 153 (3731) :34-&

[4] Performance Guaranteed Finite-Time Non-Affine Control of Waverider Vehicles Without Function-Approximation [J].

Bu, Xiangwei ;

Jiang, Baoxu ;

Lei, Humin .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (03) :3252-3262

[5] Low-Complexity Fuzzy Neural Control of Constrained Waverider Vehicles via Fragility-Free Prescribed Performance Approach [J].

Bu, Xiangwei ;

Jiang, Baoxu ;

Lei, Humin .

IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (07) :2127-2139

[6] A Simplified Finite-Time Fuzzy Neural Controller With Prescribed Performance Applied to Waverider Aircraft [J].

Bu, Xiangwei ;

Qi, Qiang ;

Jiang, Baoxu .

IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (07) :2529-2537

[7] Nonfragile Quantitative Prescribed Performance Control of Waverider Vehicles With Actuator Saturation [J].

Bu, Xiangwei ;

Jiang, Baoxu ;

Lei, Humin .

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2022, 58 (04) :3538-3548

[8] A neural network solution for fixed-final time optimal control of nonlinear systems [J].

Cheng, Tao ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :482-490

[9] Reinforcement learning in continuous time and space [J].

Doya, K .

NEURAL COMPUTATION, 2000, 12 (01) :219-245

[10] Direct adaptive NN control of a class of nonlinear systems [J].

Ge, SS ;

Wang, C .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (01) :214-221

← 1 2 3 4 →