Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach

被引:88
作者
Bian, Tao [1 ]
Jiang, Zhong-Ping [1 ]
机构
[1] NYU, Control & Networks Lab, Tandon Sch Engn, Dept Elect & Comp Engn, Brooklyn, NY 11201 USA
基金
美国国家科学基金会;
关键词
Nonlinear systems; Optimal control; Adaptive systems; Dynamical systems; Mathematical model; Heuristic algorithms; Linear systems; Adaptive optimal control; nonlinear systems; value iteration (VI); INTERCONNECTED SYSTEMS; STABILIZATION;
D O I
10.1109/TNNLS.2020.3045087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article studies the adaptive optimal control problem for continuous-time nonlinear systems described by differential equations. A key strategy is to exploit the value iteration (VI) method proposed initially by Bellman in 1957 as a fundamental tool to solve dynamic programming problems. However, previous VI methods are all exclusively devoted to the Markov decision processes and discrete-time dynamical systems. In this article, we aim to fill up the gap by developing a new continuous-time VI method that will be applied to address the adaptive or nonadaptive optimal control problems for continuous-time systems described by differential equations. Like the traditional VI, the continuous-time VI algorithm retains the nice feature that there is no need to assume the knowledge of an initial admissible control policy. As a direct application of the proposed VI method, a new class of adaptive optimal controllers is obtained for nonlinear systems with totally unknown dynamics. A learning-based control algorithm is proposed to show how to learn robust optimal controllers directly from real-time data. Finally, two examples are given to illustrate the efficacy of the proposed methodology.
引用
收藏
页码:2781 / 2790
页数:10
相关论文
共 47 条