Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach

被引：88

作者：

Bian, Tao ^{[1
]}

Jiang, Zhong-Ping ^{[1
]}

机构：

[1] NYU, Control & Networks Lab, Tandon Sch Engn, Dept Elect & Comp Engn, Brooklyn, NY 11201 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2022年 / 33卷 / 07期

基金：

美国国家科学基金会;

关键词：

Nonlinear systems; Optimal control; Adaptive systems; Dynamical systems; Mathematical model; Heuristic algorithms; Linear systems; Adaptive optimal control; nonlinear systems; value iteration (VI); INTERCONNECTED SYSTEMS; STABILIZATION;

D O I：

10.1109/TNNLS.2020.3045087

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article studies the adaptive optimal control problem for continuous-time nonlinear systems described by differential equations. A key strategy is to exploit the value iteration (VI) method proposed initially by Bellman in 1957 as a fundamental tool to solve dynamic programming problems. However, previous VI methods are all exclusively devoted to the Markov decision processes and discrete-time dynamical systems. In this article, we aim to fill up the gap by developing a new continuous-time VI method that will be applied to address the adaptive or nonadaptive optimal control problems for continuous-time systems described by differential equations. Like the traditional VI, the continuous-time VI algorithm retains the nice feature that there is no need to assume the knowledge of an initial admissible control policy. As a direct application of the proposed VI method, a new class of adaptive optimal controllers is obtained for nonlinear systems with totally unknown dynamics. A learning-based control algorithm is proposed to show how to learn robust optimal controllers directly from real-time data. Finally, two examples are given to illustrate the efficacy of the proposed methodology.

引用

页码：2781 / 2790

页数：10

共 47 条

[1] STABILIZATION WITH RELAXED CONTROLS
ARTSTEIN, Z
[J]. NONLINEAR ANALYSIS-THEORY METHODS & APPLICATIONS, 1983, 7 (11) : 1163 - 1173
[2] Beard R., 1995, IMPROVING CLOSED LOO
[3] DYNAMIC PROGRAMMING
BELLMAN, R
[J]. SCIENCE, 1966, 153 (3731) : 34 - &
[4] Bellman R. E., 2015, Adaptive Control Processes
[5] Bertsekas D. P., 2007, DYNAMIC PROGRAMMING
[6] Bertsekas D. P., 2012, DYNAMIC PROGRAMMING, V4th
[7] Bertsekas D. P., 1996, NEURO DYNAMIC PROGRA, V5
[8] STABLE OPTIMAL CONTROL AND SEMICONTRACTIVE DYNAMIC PROGRAMMING
Bertsekas, Dimitri P.
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2018, 56 (01) : 231 - 252
[9] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming
Bertsekas, Dimitri P.
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) : 500 - 509
[10] Computational nature of human adaptive control during learning of reaching movements in force fields
Bhushan, N
Shadmehr, R
[J]. BIOLOGICAL CYBERNETICS, 1999, 81 (01) : 39 - 60

← 1 2 3 4 5 →