Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems

被引：593

作者：

Liu, Derong ^{[1
]}

Wei, Qinglai ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2014年 / 25卷 / 03期

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

Adaptive critic designs; adaptive dynamic programming (ADP); approximate dynamic programming; discrete-time policy iteration; neural networks; neurodynamic programming; nonlinear systems; optimal control; reinforcement learning; NETWORKED CONTROL-SYSTEM; OPTIMAL TRACKING CONTROL; ONLINE LEARNING CONTROL; CONTROL SCHEME; FEEDBACK-CONTROL; CRITIC DESIGNS; REINFORCEMENT; APPROXIMATION; ARCHITECTURE;

D O I：

10.1109/TNNLS.2013.2281663

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.

引用

页码：621 / 634

页数：14

共 50 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :943-949

[3]

[Anonymous], 1996, Neuro-dynamic programming

[4]

[Anonymous], 1992, HDB INTELLIGENT CONT

[5]

Beard R., 1995, IMPROVING CLOSED LOO

[6]

Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics

[7] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].

Bhasin, S. ;

Kamalapurkar, R. ;

Johnson, M. ;

Vamvoudakis, K. G. ;

Lewis, F. L. ;

Dixon, W. E. .

AUTOMATICA, 2013, 49 (01) :82-92

[8] Online Optimal Control of Affine Nonlinear Discrete-Time Systems With Unknown Internal Dynamics by Using Time-Based Policy Update [J].

Dierks, Travis ;

Jagannathan, Sarangapani .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (07) :1118-1129

[9] Helicopter trimming and tracking control using direct neural dynamic programming [J].

Enns, R ;

Si, J .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2003, 14 (04) :929-939

[10] Simple and Fast Calculation of the Second-Order Gradients for Globalized Dual Heuristic Dynamic Programming in Neural Networks [J].

Fairbank, Michael ;

Alonso, Eduardo ;

Prokhorov, Danil .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (10) :1671-1676

← 1 2 3 4 5 →