Finite-horizon optimal control for continuous-time uncertain nonlinear systems using reinforcement learning

被引：12

作者：

Zhao, Jingang ^{[1
,2
]}

Gan, Minggang ^{[1
,2
]}

机构：

[1] Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China

[2] Beijing Inst Technol, State Key Lab Intelligent Control & Decis Complex, Beijing 100081, Peoples R China

来源：

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE | 2020年 / 51卷 / 13期

基金：

中国国家自然科学基金;

关键词：

Finite-horizon; optimal control; continuous-time; uncertain nonlinear systems; reinforcement learning; ADAPTIVE OPTIMAL-CONTROL; OPTIMAL TRACKING CONTROL; CONSTRAINED OPTIMAL-CONTROL; POLICY ITERATION; LINEAR-SYSTEMS; CONTROL SCHEME;

D O I：

10.1080/00207721.2020.1797223

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper investigates finite-horizon optimal control problem of continuous-time uncertain nonlinear systems. The uncertainty here refers to partially unknown system dynamics. Unlike the infinite-horizon, the difficulty of finite-horizon optimal control problem is that the Hamilton-Jacobi-Bellman (HJB) equation is time-varying and must meet certain terminal boundary constraints, which brings greater challenges. At the same time, the partially unknown system dynamics have also caused additional difficulties. The main innovation of this paper is the proposed cyclic fixed-finite-horizon-based reinforcement learning algorithm to approximately solve the time-varying HJB equation. The proposed algorithm mainly consists of two phases: the data collection phase over a fixed-finite-horizon and the parameters update phase. A least-squares method is used to correlate the two phases to obtain the optimal parameters by cyclic. Finally, simulation results are given to verify the effectiveness of the proposed cyclic fixed-finite-horizon-based reinforcement learning algorithm.

引用

页码：2429 / 2440

页数：12

共 45 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[3]

[Anonymous], 2015, OPTIMAL CONTROL

[4]

[Anonymous], MODEL BASED REINFORC

[5]

[Anonymous], 1996, Neuro-dynamic Programming

[6] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming [J].

Bertsekas, Dimitri P. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :500-509

[7] Fixed-final-time-constrained optimal control, of Nonlinear systems using neural network HJB approach [J].

Cheng, Tao ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (06) :1725-1737

[8] A neural network solution for fixed-final time optimal control of nonlinear systems [J].

Cheng, Tao ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :482-490

[9] Extended adaptive optimal control of linear systems with unknown dynamics using adaptive dynamic programming [J].

Gan, Minggang ;

Zhao, Jingang ;

Zhang, Chi .

ASIAN JOURNAL OF CONTROL, 2021, 23 (02) :1097-1106

[10] Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming [J].

Gao, Weinan ;

Jiang, Yu ;

Jiang, Zhong-Ping ;

Chai, Tianyou .

AUTOMATICA, 2016, 72 :37-45

← 1 2 3 4 5 →