A Model-free Framework for Nonlinear Optimal Control Based on Solving HJB Equation

被引：0

作者：

Wang, Yutian ^{[1
]}

Ni, Yuan-Hua ^{[1
]}

Guo, Xian ^{[1
]}

Chen, Zengqiang ^{[1
]}

机构：

[1] Nankai Univ, Coll Artificial Intelligence, Tianjin 300071, Peoples R China

来源：

2022 41ST CHINESE CONTROL CONFERENCE (CCC) | 2022年

基金：

中国国家自然科学基金;

关键词：

nonlinear systems; optimal control; stochastic systems; model-free learning;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes a learning-based framework for searching the optimal feedback controller of a class of nonlinear optimal control problem based on solving HJB equation. By introducing Gaussian control noise for exploration, we aim to learn a robust controller in this stochastic case. This reformulation sacrifices some optimality to some extent, but as suggested in reinforcement learning (RL) exploration noise is essential to enable the model-free learning. The new stochastic optimal control problem is solved by finding the solution to the HJB equation, and is further reformulated to a constrained optimization problem that can be carried out by a model-free algorithm. Compared with the Markovian framework of RL, our method is derived under stochastic differential equation description in continuous time and space, thus is preferred from a theoretical point of view. We demonstrate the practical potential of the proposed method by three classical nonlinear control tasks.

引用

页码：1711 / 1716

页数：6

共 11 条

[1]

[Anonymous], 2016, Openai gym

[2] Solving high-dimensional partial differential equations using deep learning [J].

Han, Jiequn ;

Jentzen, Arnulf ;

Weinan, E. .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (34) :8505-8510

[3]

Karatzas I., 2012, Brownian motion and stochastic calculus, V113

[4] Nonlinear Optimal Control Design for Underactuated Two-Wheeled Inverted Pendulum Mobile Platform [J].

Kim, Sangtae ;

Kwon, SangJoo .

IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2017, 22 (06) :2803-2808

[5] Policy iterations for reinforcement learning problems in continuous time and space - Fundamental theory and methods [J].

Lee, Jaeyoung ;

Sutton, Richard S. .

AUTOMATICA, 2021, 126

[6] ADAPTIVE DEEP LEARNING FOR HIGH-DIMENSIONAL HAMILTON-JACOBI-BELLMAN EQUATIONS [J].

Nakamura-Zimmerer, Tenavi ;

Gong, Qi ;

Kang, Wei .

SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2021, 43 (02) :A1221-A1247

[7]

Onken Derek, 2021, ARXIV210403270 MATH

[8]

Pereira MA, 2019, ROBOTICS: SCIENCE AND SYSTEMS XV

[9] DGM: A deep learning algorithm for solving partial differential equations [J].

Sirignano, Justin ;

Spiliopoulos, Konstantinos .

JOURNAL OF COMPUTATIONAL PHYSICS, 2018, 375 :1339-1364

[10]

Weinan E, 2020, ARXIV200813333

← 1 2 →