A Model-free Framework for Nonlinear Optimal Control Based on Solving HJB Equation

被引:0
作者
Wang, Yutian [1 ]
Ni, Yuan-Hua [1 ]
Guo, Xian [1 ]
Chen, Zengqiang [1 ]
机构
[1] Nankai Univ, Coll Artificial Intelligence, Tianjin 300071, Peoples R China
来源
2022 41ST CHINESE CONTROL CONFERENCE (CCC) | 2022年
基金
中国国家自然科学基金;
关键词
nonlinear systems; optimal control; stochastic systems; model-free learning;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a learning-based framework for searching the optimal feedback controller of a class of nonlinear optimal control problem based on solving HJB equation. By introducing Gaussian control noise for exploration, we aim to learn a robust controller in this stochastic case. This reformulation sacrifices some optimality to some extent, but as suggested in reinforcement learning (RL) exploration noise is essential to enable the model-free learning. The new stochastic optimal control problem is solved by finding the solution to the HJB equation, and is further reformulated to a constrained optimization problem that can be carried out by a model-free algorithm. Compared with the Markovian framework of RL, our method is derived under stochastic differential equation description in continuous time and space, thus is preferred from a theoretical point of view. We demonstrate the practical potential of the proposed method by three classical nonlinear control tasks.
引用
收藏
页码:1711 / 1716
页数:6
相关论文
共 11 条
[1]  
[Anonymous], 2016, Openai gym
[2]   Solving high-dimensional partial differential equations using deep learning [J].
Han, Jiequn ;
Jentzen, Arnulf ;
Weinan, E. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (34) :8505-8510
[3]  
Karatzas I., 2012, Brownian motion and stochastic calculus, V113
[4]   Nonlinear Optimal Control Design for Underactuated Two-Wheeled Inverted Pendulum Mobile Platform [J].
Kim, Sangtae ;
Kwon, SangJoo .
IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2017, 22 (06) :2803-2808
[5]   Policy iterations for reinforcement learning problems in continuous time and space - Fundamental theory and methods [J].
Lee, Jaeyoung ;
Sutton, Richard S. .
AUTOMATICA, 2021, 126
[6]   ADAPTIVE DEEP LEARNING FOR HIGH-DIMENSIONAL HAMILTON-JACOBI-BELLMAN EQUATIONS [J].
Nakamura-Zimmerer, Tenavi ;
Gong, Qi ;
Kang, Wei .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2021, 43 (02) :A1221-A1247
[7]  
Onken Derek, 2021, ARXIV210403270 MATH
[8]  
Pereira MA, 2019, ROBOTICS: SCIENCE AND SYSTEMS XV
[9]   DGM: A deep learning algorithm for solving partial differential equations [J].
Sirignano, Justin ;
Spiliopoulos, Konstantinos .
JOURNAL OF COMPUTATIONAL PHYSICS, 2018, 375 :1339-1364
[10]  
Weinan E, 2020, ARXIV200813333