Discrete-Time Nonlinear Optimal Control Using Multi-Step Reinforcement Learning

被引:7
作者
An, Ningbo [1 ]
Wang, Qishao [1 ]
Zhao, Xiaochuan [2 ]
Wang, Qingyun [1 ]
机构
[1] Beihang Univ, Dept Dynam & Control, Beijing 100191, Peoples R China
[2] China North Ind Grp Corp Ltd, Inst Comp Applicat Technol, Beijing 100095, Peoples R China
基金
中国国家自然科学基金;
关键词
Convergence; Optimal control; Heuristic algorithms; Mathematical models; Approximation algorithms; Reinforcement learning; Nonlinear systems; optimal bellman equation; actor-critic architecture; optimal control; SYSTEMS;
D O I
10.1109/TCSII.2023.3343375
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This brief solves the optimal control problem of discrete-time nonlinear systems by proposing a multi-step reinforcement learning (RL) algorithm. The proposed multi-step RL algorithm is established based on the discrete-time optimal Bellman equation, which takes advantage of policy iteration (PI) and value iteration (VI). Benefiting from the multi-step integration mechanism, the algorithm is accelerated. The convergence of multi-step RL is proved by mathematical induction. For real-world implementation purposes, neural network (NN) and Actor-Critic architecture are introduced to approximate the iterative value functions and control policies. A numerical simulation of Chua's circuit illustrates the effectiveness of the proposed algorithm.
引用
收藏
页码:2279 / 2283
页数:5
相关论文
共 24 条
[1]   Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].
Al-Tamimi, Asma ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :943-949
[2]   Differential flatness-based distributed control of underactuated robot swarms [J].
An, Ningbo ;
Wang, Qishao ;
Zhao, Xiaochuan ;
Wang, Qingyun .
APPLIED MATHEMATICS AND MECHANICS-ENGLISH EDITION, 2023, 44 (10) :1777-1790
[3]   Model-Free distributed optimal consensus control of nonlinear multi-agent systems: A graphical game approach [J].
An, Ningbo ;
Zhao, Xiaochuan ;
Wang, Qishao ;
Wang, Qingyun .
JOURNAL OF THE FRANKLIN INSTITUTE, 2023, 360 (12) :8753-8771
[4]   THE THEORY OF DYNAMIC PROGRAMMING [J].
BELLMAN, R .
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1954, 60 (06) :503-515
[5]   Distributed Reinforcement Learning for Cyber-Physical System With Multiple Remote State Estimation Under DoS Attacker [J].
Dai, Pengcheng ;
Yu, Wenwu ;
Wang, He ;
Wen, Guanghui ;
Lv, Yuezu .
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2020, 7 (04) :3212-3222
[6]   Finite-Time Consensus for Second-Order Multi-Agent Systems With Input Saturation [J].
Fu, Junjie ;
Wen, Guanghui ;
Yu, Wenwu ;
Ding, Zhengtao .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (11) :1758-1762
[7]   Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey [J].
Haydari, Ammar ;
Yilmaz, Yasin .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (01) :11-32
[8]   Stability Analysis of Optimal Adaptive Control Under Value Iteration Using a Stabilizing Initial Policy [J].
Heydari, Ali .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (09) :4522-4527
[9]   Distributed Multiagent Reinforcement Learning With Action Networks for Dynamic Economic Dispatch [J].
Hu, Chengfang ;
Wen, Guanghui ;
Wang, Shuai ;
Fu, Junjie ;
Yu, Wenwu .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) :9553-9564
[10]   Safe Reinforcement Learning for Model-Reference Trajectory Tracking of Uncertain Autonomous Vehicles With Model-Based Acceleration [J].
Hu, Yifan ;
Fu, Junjie ;
Wen, Guanghui .
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (03) :2332-2344