Online adaptive optimal control algorithm based on synchronous integral reinforcement learning with explorations

被引：9

作者：

Guo, Lei ^{[1
]}

Zhao, Han ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

NEUROCOMPUTING | 2023年 / 520卷

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning; Neural networks; Adaptive control; Actor; -critic; Explorations; TIME LINEAR-SYSTEMS; NONLINEAR-SYSTEMS;

D O I：

10.1016/j.neucom.2022.11.055

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this study, we present a novel algorithm, based on synchronous policy iteration, to solve the continuous-time infinite-horizon optimal control problem of input affine system dynamics. The integral reinforcement is measured as an excitation signal to estimate the solution to the Hamilton-Jacobi-Bell man equation. In addition, the proposed method is completely model-free, that is, no a priori knowledge of the system is required. Using the adaptive tuning law, the actor and critic neural networks can simultaneously approximate the optimal value function and policy. The persistence of excitation condition is required to guarantee the convergence of the two networks. Unlike in traditional policy iteration algorithms, the restriction of the initial admissible policy was eliminated using this method. The effectiveness of the proposed algorithm is verified through numerical simulations. (c) 2022 Elsevier B.V. All rights reserved.

引用

页码：250 / 261

页数：12

共 37 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2]

[Anonymous], 1974, PHD DISSERTATION

[3]

[Anonymous], 1998, Neural network control of robot manipulators and non-linear systems

[4]

BAIRD LC, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P2448, DOI 10.1109/ICNN.1994.374604

[5] Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation [J].

Beard, RW ;

Saridis, GN ;

Wen, JT .

AUTOMATICA, 1997, 33 (12) :2159-2177

[6] DYNAMIC PROGRAMMING [J].

BELLMAN, R .

SCIENCE, 1966, 153 (3731) :34-&

[7]

Bertsekas D. P., 1996, Neuro-Dynamic Programming

[8] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].

Bhasin, S. ;

Kamalapurkar, R. ;

Johnson, M. ;

Vamvoudakis, K. G. ;

Lewis, F. L. ;

Dixon, W. E. .

AUTOMATICA, 2013, 49 (01) :82-92

[9] Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach [J].

Bian, Tao ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (07) :2781-2790

[10]

Bian T, 2016, IEEE DECIS CONTR P, P3375, DOI 10.1109/CDC.2016.7798777

← 1 2 3 4 →