Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design

被引：228

作者：

Bian, Tao ^{[1
]}

Jiang, Zhong-Ping ^{[1
]}

机构：

[1] NYU, Tandon Sch Engn, Dept Elect & Comp Engn, Control & Networks Lab,Metrotech Ctr 5, Brooklyn, NY 11201 USA

来源：

AUTOMATICA | 2016年 / 71卷

基金：

美国国家科学基金会;

关键词：

Value iteration; Adaptive dynamic programming; Optimal control; Adaptive control; Stochastic approximation; CONTINUOUS-TIME SYSTEMS; STOCHASTIC-APPROXIMATION; NONLINEAR-SYSTEMS; LINEAR-SYSTEMS; ROBUST STABILIZATION; STABILITY; DELAYS;

D O I：

10.1016/j.automatica.2016.05.003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a novel non-model-based, data-driven adaptive optimal controller design for linear continuous-time systems with completely unknown dynamics. Inspired by the stochastic approximation theory, a continuous-time version of the traditional value iteration (VI) algorithm is presented with rigorous convergence analysis. This VI method is crucial for developing new adaptive dynamic programming methods to solve the adaptive optimal control problem and the stochastic robust optimal control problem for linear continuous-time systems. Fundamentally different from existing results, the a priori knowledge of an initial admissible control policy is no longer required. The efficacy of the proposed methodology is illustrated by two examples and a brief comparative study between VI and earlier policy iteration methods. (C) 2016 Elsevier Ltd. All rights reserved.

引用

页码：348 / 360

页数：13

共 79 条

[1] Stochastic approximation or nonexpansive maps:: Application to Q-learning algorithms [J].

Abounadi, J ;

Bertsekas, DP ;

Borkar, V .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2002, 41 (01) :1-22

[2] Stability of stochastic approximation under verifiable conditions [J].

Andrieu, C ;

Moulines, É ;

Priouret, P .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2005, 44 (01) :283-312

[3]

[Anonymous], 2009, CONVERGE PROBAB MEAS

[4]

[Anonymous], OPTIMAL ADAPTIVE CON

[5]

[Anonymous], 2016, Athena scientific optimization and computation series

[6]

Arnold L., 1974, Stochastic differential equations

[7]

Astrom K.J., 1997, Adaptive Control, V2

[8] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].

BARTO, AG ;

SUTTON, RS ;

ANDERSON, CW .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846

[9] Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation [J].

Beard, RW ;

Saridis, GN ;

Wen, JT .

AUTOMATICA, 1997, 33 (12) :2159-2177

[10]

Bellman R., 1957, Dynamic Programming

← 1 2 3 4 5 6 7 8 →