Reinforcement learning-based optimal control of unknown constrained-input nonlinear systems using simulated experience

被引:0
作者
Asl, Hamed Jabbari [1 ]
Uchibe, Eiji [1 ]
机构
[1] ATR Computat Neurosci Labs, Dept Brain Robot Interface, 2-2-2 Hikaridai, Seika, Kyoto 6190288, Japan
关键词
Optimal control; Reinforcement learning; Input constraints; Uncertainty; APPROXIMATE OPTIMAL-CONTROL; TRACKING CONTROL; CONTINUOUS-TIME;
D O I
10.1007/s11071-023-08688-0
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
Reinforcement learning (RL) provides a way to approximately solve optimal control problems. Furthermore, online solutions to such problems require a method that guarantees convergence to the optimal policy while also ensuring stability during the learning process. In this study, we develop an online RL-based optimal control framework for input-constrained nonlinear systems. Its design includes two new model identifiers that learn a system's drift dynamics: a slow identifier used to simulate experience that supports the convergence of optimal problem solutions and a fast identifier that keeps the system stable during the learning phase. This approach is a critic-only design, in which a new fast estimation law is developed for a critic network. A Lyapunov-based analysis shows that the estimated control policy converges to the optimal one. Moreover, simulation studies demonstrate the effectiveness of our developed control scheme.
引用
收藏
页码:16093 / 16110
页数:18
相关论文
共 35 条
[1]   Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].
Abu-Khalaf, M ;
Lewis, FL .
AUTOMATICA, 2005, 41 (05) :779-791
[2]   Adaptive-critic-based neural networks for aircraft optimal control [J].
Balakrishnan, SN ;
Biega, V .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1996, 19 (04) :893-898
[3]   A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].
Bhasin, S. ;
Kamalapurkar, R. ;
Johnson, M. ;
Vamvoudakis, K. G. ;
Lewis, F. L. ;
Dixon, W. E. .
AUTOMATICA, 2013, 49 (01) :82-92
[4]   Concurrent Learning for Convergence in Adaptive Control without Persistency of Excitation [J].
Chowdhary, Girish ;
Johnson, Eric .
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3674-3679
[5]   Theory and Flight-Test Validation of a Concurrent-Learning Adaptive Controller [J].
Chowdhary, Girish V. ;
Johnson, Eric N. .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2011, 34 (02) :592-607
[6]   Adaptive position force control of robot manipulators without velocity measurements: Theory and experimentation [J].
deQueiroz, MS ;
Hu, J ;
Dawson, DM ;
Burg, T ;
Donepudi, SR .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1997, 27 (05) :796-809
[7]  
Dierks T, 2010, P AMER CONTR CONF, P1568
[8]  
Dong H., 2020, IEEE T SYST MAN CY-S
[9]   Reinforcement learning in continuous time and space [J].
Doya, K .
NEURAL COMPUTATION, 2000, 12 (01) :219-245
[10]  
Edwin KP., 2013, INTRO OPTIMIZATION, V75, P514