Convergence analysis of the deep neural networks based globalized dual heuristic programming

被引：9

作者：

Kim, Jong Woo ^{[1
]}

Oh, Tae Hoon ^{[2
]}

Son, Sang Hwan ^{[3
]}

Jeong, Dong Hwi ^{[1
]}

Lee, Jong Min ^{[2
]}

机构：

[1] Seoul Natl Univ, Engn Dev Res Ctr, 1 Gwanak Ro, Seoul 08826, South Korea

[2] Seoul Natl Univ, Inst Chem Proc, Sch Chem & Biol Engn, 1 Gwanak Ro, Seoul 08826, South Korea

[3] Texas A&M Univ, Artie McFerrin Dept Chem Engn, College Stn, TX 77845 USA

来源：

AUTOMATICA | 2020年 / 122卷

基金：

新加坡国家研究基金会;

关键词：

Approximate dynamic programming; Reinforcement learning; Deep neural networks; Lyapunov stability; Nonlinear control;

D O I：

10.1016/j.automatica.2020.109222

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Globalized dual heuristic programming (GDHP) algorithm is a special form of approximate dynamic programming (ADP) method that solves the Hamilton-Jacobi-Bellman (HJB) equation for the case where the system takes control-affine form subject to the quadratic cost function. This study incorporates the deep neural networks (DNNs) as a function approximator to inherit the advantages of which to express high-dimensional function space. Elementwise error bound of the costate function sequence is newly derived and the convergence property is presented. In the approximated function space, uniformly ultimate boundedness (UUB) condition for the weights of the general multi-layer NNs weights is obtained. It is also proved that under the gradient descent method for solving the moving target regression problem, UUB gradually converges to the value, which exclusively contains the approximation reconstruction error. The proposed method is demonstrated on the continuous reactor control in aims to obtain the control policy for multiple initial states, which justifies the necessity of DNNs structure for such cases. (c) 2020 Elsevier Ltd. All rights reserved.

引用

页数：8

共 39 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2] Fixed final time optimal control approach for bounded robust controller design using Hamilton-Jacobi-Bellman solution [J].

Adhyaru, D. M. ;

Kar, I. N. ;

Gopal, M. .

IET CONTROL THEORY AND APPLICATIONS, 2009, 3 (09) :1183-1195

[3] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :943-949

[4] Optimal Control of Propagating Fronts by Using Level Set Methods and Neural Approximations [J].

Alessandri, Angelo ;

Bagnerini, Patrizia ;

Gaggero, Mauro .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :902-912

[5] Feedback Optimal Control of Distributed Parameter Systems by Using Finite-Dimensional Approximation Schemes [J].

Alessandri, Angelo ;

Gaggero, Mauro ;

Zoppoli, Riccardo .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (06) :984-996

[6]

[Anonymous], 2005, Dynamic Programming & Optimal Control

[7]

Bertsekas D., 1996, Neuro-Dynamic Pro-gramming, V27

[8] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].

Bhasin, S. ;

Kamalapurkar, R. ;

Johnson, M. ;

Vamvoudakis, K. G. ;

Lewis, F. L. ;

Dixon, W. E. .

AUTOMATICA, 2013, 49 (01) :82-92

[9] An overview of simultaneous strategies for dynamic optimization [J].

Biegler, Lorenz T. .

CHEMICAL ENGINEERING AND PROCESSING-PROCESS INTENSIFICATION, 2007, 46 (11) :1043-1053

[10]

Chen H., 1995, Proceedings of the Third European Control Conference. ECC 95, P3247

← 1 2 3 4 →