Monte Carlo-based reinforcement learning control for unmanned aerial vehicle systems

被引:13
作者
Wei, Qinglai [1 ,2 ,3 ]
Yang, Zesheng [1 ,2 ]
Su, Huaizhong [4 ]
Wang, Lijian [4 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Macau Univ Sci & Technol, Inst Syst Engn, Macau 999078, Peoples R China
[4] Beijing Aeronaut Technol Res Inst COMAC, Beijing 102211, Peoples R China
关键词
Reinforcement learning; Adaptive dynamic programming (ADP); UAV control; Monte Carlo simulation; Neural networks; LINEAR MULTIAGENT SYSTEMS; NEURAL-NETWORK; NONLINEAR-SYSTEMS; QUADROTOR; UAV; CONSENSUS; DYNAMICS; TRACKING; DESIGN; GAMES;
D O I
10.1016/j.neucom.2022.08.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a new data-driven reinforcement learning method based on Monte Carlo simulation is developed to solve the optimal control problem of unmanned aerial vehicle (UAV) systems. Based on the data which are generated by Monte Carlo simulation, neural network (NN) is used to construct the dynamics of the UAV system with unknown disturbances, where the mathematical model of the UAV sys-tem is unnecessary. An effective iterative framework of action and critic is constructed to obtain the opti-mal control law. The convergence property is developed to guarantee that the iterative performance cost function converges to a finite neighborhood of the optimal performance cost function. Finally, numerical results are given to illustrate the effectiveness of the developed method.(c) 2022 Published by Elsevier B.V.
引用
收藏
页码:282 / 291
页数:10
相关论文
共 50 条
[1]   Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].
Al-Tamimi, Asma ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :943-949
[2]  
[Anonymous], 2000, Predictive Control with Constraints
[3]  
Bemporad A, 2011, IEEE DECIS CONTR P, P7488, DOI 10.1109/CDC.2011.6160521
[4]  
Bouabdallah S., 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), P2451
[5]  
Chamseddine A., 2011, INFOTECH AEROSPACE 2
[6]   Neural-network estimators based fault-tolerant tracking control for AUV via ADP with rudders faults and ocean current disturbance [J].
Che, Gaofeng ;
Yu, Zhen .
NEUROCOMPUTING, 2020, 411 :442-454
[7]   A novel nonlinear resilient control for a quadrotor UAV via backstepping control and nonlinear disturbance observer [J].
Chen, Fuyang ;
Lei, Wen ;
Zhang, Kangkang ;
Tao, Gang ;
Jiang, Bin .
NONLINEAR DYNAMICS, 2016, 85 (02) :1281-1295
[8]   Nonlinear Control of Quadrotor for Point Tracking: Actual Implementation and Experimental Tests [J].
Choi, Young-Cheol ;
Ahn, Hyo-Sung .
IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2015, 20 (03) :1179-1192
[9]   Global optimal feedback control for general nonlinear systems with nonquadratic performance criteria [J].
Çimen, T ;
Banks, SP .
SYSTEMS & CONTROL LETTERS, 2004, 53 (05) :327-346
[10]   Backstepping Approach for Controlling a Quadrotor Using Lagrange Form Dynamics [J].
Das, Abhijit ;
Lewis, Frank ;
Subbarao, Kamesh .
JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2009, 56 (1-2) :127-151