Two-loop reinforcement learning algorithm for finite-horizon optimal control of continuous-time affine nonlinear systems

被引：5

作者：

Chen, Zhe ^{[1
,2
,3
]}

Xue, Wenqian ^{[4
,5
]}

Li, Ning ^{[1
,2
,3
]}

Lewis, Frank L. ^{[6
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China

[2] Minist Educ China, Key Lab Syst Control & Informat Proc, Shanghai, Peoples R China

[3] Shanghai Engn Res Ctr Intelligent Control & Manag, Shanghai, Peoples R China

[4] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang, Peoples R China

[5] Northeastern Univ, Int Joint Res Lab Integrated Automat, Shenyang, Peoples R China

[6] Univ Texas Arlington, UTA Res Inst, Arlington, TX 76019 USA

来源：

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL | 2022年 / 32卷 / 01期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

continuous-time nonlinear system; finite-horizon optimal control; iterative learning control; policy iteration; reinforcement learning; value function approximation; OPTIMAL TRACKING CONTROL;

D O I：

10.1002/rnc.5826

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article proposes three novel time-varying policy iteration algorithms for finite-horizon optimal control problem of continuous-time affine nonlinear systems. We first propose a model-based time-varying policy iteration algorithm. The method considers time-varying solutions to the Hamiltonian-Jacobi-Bellman equation for finite-horizon optimal control. Based on this algorithm, value function approximation is applied to the Bellman equation by establishing neural networks with time-varying weights. A novel update law for time-varying weights is put forward based on the idea of iterative learning control, which obtains optimal solutions more efficiently compared to previous works. Considering that system models may be unknown in real applications, we propose a partially model-free time-varying policy iteration algorithm that applies integral reinforcement learning to acquiring the time-varying value function. Moreover, analysis of convergence, stability, and optimality is provided for every algorithm. Finally, simulations for different cases are given to verify the convenience and effectiveness of the proposed algorithms.

引用

页码：393 / 420

页数：28

共 41 条

[1]

Barto Andrew G., 1994, Current Opinion in Neurobiology, V4, P888, DOI 10.1016/0959-4388(94)90138-4

[2] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming [J].

Bertsekas, Dimitri P. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :500-509

[3] Model Free Adaptive Iterative Learning Consensus Tracking Control for a Class of Nonlinear Multiagent Systems [J].

Bu, Xuhui ;

Yu, Qiongxia ;

Hou, Zhongsheng ;

Qian, Wei .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2019, 49 (04) :677-686

[4] Adaptive Iterative Learning Control for Linear Systems With Binary-Valued Observations [J].

Bu, Xuhui ;

Hou, Zhongsheng .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (01) :232-237

[5] Industrial Robot Trajectory Tracking Control Using Multi-Layer Neural Networks Trained by Iterative Learning Control [J].

Chen, Shuyang ;

Wen, John T. .

ROBOTICS, 2021, 10 (01)

[6] Finite-horizon optimal control of unknown nonlinear time-delay systems [J].

Cui, Xiaohong ;

Zhang, Huaguang ;

Luo, Yanhong ;

Jiang, He .

NEUROCOMPUTING, 2017, 238 :277-285

[7] Dual-loop iterative optimal control for the finite horizon LQR problem with unknown dynamics [J].

Fong, Justin ;

Tan, Ying ;

Crocher, Vincent ;

Oetomo, Denny ;

Mareels, Iven .

SYSTEMS & CONTROL LETTERS, 2018, 111 :49-57

[8] Optimal control for unknown mean-field discrete-time system based on Q-Learning [J].

Ge, Yingying ;

Liu, Xikui ;

Li, Yan .

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2021, 52 (15) :3335-3349

[9] Finite-Horizon Discounted Optimal Control: Stability and Performance [J].

Granzotto, Mathieu ;

Postoyan, Romain ;

Busoniu, Lucian ;

Nesic, Dragan ;

Daafouz, Jamal .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (02) :550-565

[10]

Granzotto M, 2018, IEEE DECIS CONTR P, P2322, DOI 10.1109/CDC.2018.8619557

← 1 2 3 4 5 →