Reinforcement learning-based tracking control for a quadrotor unmanned aerial vehicle under external disturbances

被引：30

作者：

Liu, Hui ^{[1
]}

Li, Bo ^{[1
]}

Xiao, Bing ^{[2
]}

Ran, Dechao ^{[3
]}

Zhang, Chengxi ^{[4
]}

机构：

[1] Shanghai Maritime Univ, Inst Logist Sci & Engn, Shanghai 201306, Peoples R China

[2] Northwestern Polytech Univ, Sch Automat, Xian, Peoples R China

[3] Chinese Acad Mil Sci, Natl Innovat Inst Def Technol, Beijing, Peoples R China

[4] Harbin Inst Technol, Sch Elect & Informat Engn, Shenzhen, Peoples R China

来源：

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL | 2023年 / 33卷 / 17期

基金：

中国国家自然科学基金;

关键词：

adaptive dynamic programming; appointed-fixed-time observer; reinforcement learning; trajectory tracking control; unmanned aerial vehicle; APPROXIMATE OPTIMAL-CONTROL; UNCERTAIN NONLINEAR-SYSTEMS; FAULT-TOLERANT CONTROL; TRAJECTORY TRACKING; ROBUST-CONTROL; DESIGN; OBSERVER; STABILIZATION; SUBJECT;

D O I：

10.1002/rnc.6334

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article addresses the high-accuracy intelligent trajectory tracking control problem of a quadrotor unmanned aerial vehicle (UAV) subject to external disturbances. The tracking error systems are first reestablished by utilizing the feedforward control technique to compensate for the raw error dynamics of the quadrotor UAV. Then, two novel appointed-fixed-time observers are designed for the processed error systems to reconstruct the disturbance forces and torques, respectively. And the observation errors can converge to origin within the appointed time defined by users or designers. Subsequently, two novel control policies are developed utilizing reinforcement learning methodology, which can balance the control cost and control performance. Meanwhile, two critic neural networks are used to replace the traditional actor-critic networks for approximating the solutions of Hamilton-Jacobi-Bellman equations. More specifically, two novel weight update laws are developed. They can not only update the weights of the critic neural networks online, but also avoid utilizing the persistent excitation condition innovatively. And that the ultimately uniformly bounded stability of the whole control system is proved according to Lyapunov method by utilizing the proposed reinforcement learning-based control polices. Finally, simulation results are presented to illustrate the effectiveness and superior performances of the developed control scheme.

引用

页码：10360 / 10377

页数：18

共 47 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].

Bhasin, S. ;

Kamalapurkar, R. ;

Johnson, M. ;

Vamvoudakis, K. G. ;

Lewis, F. L. ;

Dixon, W. E. .

AUTOMATICA, 2013, 49 (01) :82-92

[3] Optimal Tracking Control for Uncertain Nonlinear Systems With Prescribed Performance via Critic-Only ADP [J].

Dong, Hongyang ;

Zhao, Xiaowei ;

Luo, Biao .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (01) :561-573

[4] Reinforcement Learning-Based Approximate Optimal Control for Attitude Reorientation Under State Constraints [J].

Dong, Hongyang ;

Zhao, Xiaowei ;

Yang, Haoyang .

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2021, 29 (04) :1664-1673

[5] Robust tracking control of quadrotor via on-policy adaptive dynamic programming [J].

Dou, Liqian ;

Su, Xiaotong ;

Zhao, Xinyi ;

Zong, Qun ;

He, Lei .

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2021, 31 (07) :2509-2525

[6] A review of quadrotor: An underactuated mechanical system [J].

Emran, Bara J. ;

Najjaran, Homayoun .

ANNUAL REVIEWS IN CONTROL, 2018, 46 :165-180

[7] Adaptive Actor-Critic Design-Based Integral Sliding-Mode Control for Partially Unknown Nonlinear Systems With Input Disturbances [J].

Fan, Quan-Yong ;

Yang, Guang-Hong .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (01) :165-177

[8] Learning-Based 6-DOF Control for Autonomous Proximity Operations Under Motion Constraints [J].

Hu, Qinglei ;

Yang, Haoyang ;

Dong, Hongyang ;

Zhao, Xiaowei .

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2021, 57 (06) :4097-4109

[9] Approximate optimal trajectory tracking for continuous-time nonlinear systems [J].

Kamalapurkar, Rushikesh ;

Dinh, Huyen ;

Bhasin, Shubhendu ;

Dixon, Warren E. .

AUTOMATICA, 2015, 51 :40-48

[10] Adaptive output-feedback neural tracking control for uncertain switched MIMO nonlinear systems with time delays [J].

Kong, Jie ;

Niu, Ben ;

Wang, Zhenhua ;

Zhao, Ping ;

Qi, Wenhai .

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2021, 52 (13) :2813-2830

← 1 2 3 4 5 →