Reinforcement learning-based optimal control of unknown constrained-input nonlinear systems using simulated experience

被引：0

作者：

Asl, Hamed Jabbari ^{[1
]}

Uchibe, Eiji ^{[1
]}

机构：

[1] ATR Computat Neurosci Labs, Dept Brain Robot Interface, 2-2-2 Hikaridai, Seika, Kyoto 6190288, Japan

来源：

NONLINEAR DYNAMICS | 2023年 / 111卷 / 17期

关键词：

Optimal control; Reinforcement learning; Input constraints; Uncertainty; APPROXIMATE OPTIMAL-CONTROL; TRACKING CONTROL; CONTINUOUS-TIME;

D O I：

10.1007/s11071-023-08688-0

中图分类号：

TH [机械、仪表工业];

学科分类号：

0802 ;

摘要：

Reinforcement learning (RL) provides a way to approximately solve optimal control problems. Furthermore, online solutions to such problems require a method that guarantees convergence to the optimal policy while also ensuring stability during the learning process. In this study, we develop an online RL-based optimal control framework for input-constrained nonlinear systems. Its design includes two new model identifiers that learn a system's drift dynamics: a slow identifier used to simulate experience that supports the convergence of optimal problem solutions and a fast identifier that keeps the system stable during the learning phase. This approach is a critic-only design, in which a new fast estimation law is developed for a critic network. A Lyapunov-based analysis shows that the estimated control policy converges to the optimal one. Moreover, simulation studies demonstrate the effectiveness of our developed control scheme.

引用

页码：16093 / 16110

页数：18

共 35 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2] Adaptive-critic-based neural networks for aircraft optimal control [J].

Balakrishnan, SN ;

Biega, V .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1996, 19 (04) :893-898

[3] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].

Bhasin, S. ;

Kamalapurkar, R. ;

Johnson, M. ;

Vamvoudakis, K. G. ;

Lewis, F. L. ;

Dixon, W. E. .

AUTOMATICA, 2013, 49 (01) :82-92

[4] Concurrent Learning for Convergence in Adaptive Control without Persistency of Excitation [J].

Chowdhary, Girish ;

Johnson, Eric .

49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3674-3679

[5] Theory and Flight-Test Validation of a Concurrent-Learning Adaptive Controller [J].

Chowdhary, Girish V. ;

Johnson, Eric N. .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2011, 34 (02) :592-607

[6] Adaptive position force control of robot manipulators without velocity measurements: Theory and experimentation [J].

deQueiroz, MS ;

Hu, J ;

Dawson, DM ;

Burg, T ;

Donepudi, SR .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1997, 27 (05) :796-809

[7]

Dierks T, 2010, P AMER CONTR CONF, P1568

[8]

Dong H., 2020, IEEE T SYST MAN CY-S

[9] Reinforcement learning in continuous time and space [J].

Doya, K .

NEURAL COMPUTATION, 2000, 12 (01) :219-245

[10]

Edwin KP., 2013, INTRO OPTIMIZATION, V75, P514

← 1 2 3 4 →