Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof

被引：194

作者：

Al-Tamimi, Asma ^{[1
]}

Lewis, Frank ^{[2
]}

机构：

[1] Univ Texas, Automat & Robot Res Inst, Ft Worth, TX 76118 USA

[2] Univ Texas Arlington, Automat & Robot Res Inst, Ft Worth, TX 76118 USA

来源：

2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING | 2007年

基金：

美国国家科学基金会;

关键词：

adaptive critics; approximate dynamic programming; HJB; policy iterations;

D O I：

10.1109/ADPRL.2007.368167

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely Heuristic Dynamic Programming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB) that appears in discrete-time (DT) nonlinear optimal control. Two neural networks are used- one to approximate the value function and one to approximate the optimal control action. The importance of ADP is that it allows one to solve the HJB equation for general nonlinear discrete-time systems by using a neural network to approximate the value function. The importance of this paper is that the proof of convergence of the HDP iteration scheme is provided using rigorous methods for general discrete-time nonlinear systems with continuous state and action spaces. Two examples are provided in this paper. The first example is a linear system, where ADP is found to converge to the correct solution of the Algebraic Riccati equation (ARE). The second example considers a nonlinear control system.

引用

页码：38 / +

页数：2

共 32 条

[1] Hamilton-Jacobi-Isaacs formulation for constrained input nonlinear systems [J].

Abu-Khalaf, M ;

Lewis, FL ;

Huang, J .

2004 43RD IEEE CONFERENCE ON DECISION AND CONTROL (CDC), VOLS 1-5, 2004, :5034-5040

[2] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[3]

Al-Tamimi A., 2006, IEEE T SYSTEMS MAN B

[4]

ALTAMIMI A, IN PRESS AUTOMATICA

[5] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].

BARTO, AG ;

SUTTON, RS ;

ANDERSON, CW .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846

[6]

Bertsekas D., 1996, NEURO DYNAMIC PROGRA, V1st

[7]

BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475

[8]

Cao XR, 2002, IEEE DECIS CONTR P, P3367, DOI 10.1109/CDC.2002.1184395

[9]

Chen Z, 2005, IEEE DECIS CONTR P, P4123

[10]

Christopher JohnCornish Hella by Watkins., 1989, Learning from delayed rewards

← 1 2 3 4 →