Discrete-Time Nonlinear Optimal Control Using Multi-Step Reinforcement Learning

被引：7

作者：

An, Ningbo ^{[1
]}

Wang, Qishao ^{[1
]}

Zhao, Xiaochuan ^{[2
]}

Wang, Qingyun ^{[1
]}

机构：

[1] Beihang Univ, Dept Dynam & Control, Beijing 100191, Peoples R China

[2] China North Ind Grp Corp Ltd, Inst Comp Applicat Technol, Beijing 100095, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2024年 / 71卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Convergence; Optimal control; Heuristic algorithms; Mathematical models; Approximation algorithms; Reinforcement learning; Nonlinear systems; optimal bellman equation; actor-critic architecture; optimal control; SYSTEMS;

D O I：

10.1109/TCSII.2023.3343375

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This brief solves the optimal control problem of discrete-time nonlinear systems by proposing a multi-step reinforcement learning (RL) algorithm. The proposed multi-step RL algorithm is established based on the discrete-time optimal Bellman equation, which takes advantage of policy iteration (PI) and value iteration (VI). Benefiting from the multi-step integration mechanism, the algorithm is accelerated. The convergence of multi-step RL is proved by mathematical induction. For real-world implementation purposes, neural network (NN) and Actor-Critic architecture are introduced to approximate the iterative value functions and control policies. A numerical simulation of Chua's circuit illustrates the effectiveness of the proposed algorithm.

引用

页码：2279 / 2283

页数：5

共 24 条

[1] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :943-949

[2] Differential flatness-based distributed control of underactuated robot swarms [J].

An, Ningbo ;

Wang, Qishao ;

Zhao, Xiaochuan ;

Wang, Qingyun .

APPLIED MATHEMATICS AND MECHANICS-ENGLISH EDITION, 2023, 44 (10) :1777-1790

[3] Model-Free distributed optimal consensus control of nonlinear multi-agent systems: A graphical game approach [J].

An, Ningbo ;

Zhao, Xiaochuan ;

Wang, Qishao ;

Wang, Qingyun .

JOURNAL OF THE FRANKLIN INSTITUTE, 2023, 360 (12) :8753-8771

[4] THE THEORY OF DYNAMIC PROGRAMMING [J].

BELLMAN, R .

BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1954, 60 (06) :503-515

[5] Distributed Reinforcement Learning for Cyber-Physical System With Multiple Remote State Estimation Under DoS Attacker [J].

Dai, Pengcheng ;

Yu, Wenwu ;

Wang, He ;

Wen, Guanghui ;

Lv, Yuezu .

IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2020, 7 (04) :3212-3222

[6] Finite-Time Consensus for Second-Order Multi-Agent Systems With Input Saturation [J].

Fu, Junjie ;

Wen, Guanghui ;

Yu, Wenwu ;

Ding, Zhengtao .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (11) :1758-1762

[7] Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey [J].

Haydari, Ammar ;

Yilmaz, Yasin .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (01) :11-32

[8] Stability Analysis of Optimal Adaptive Control Under Value Iteration Using a Stabilizing Initial Policy [J].

Heydari, Ali .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (09) :4522-4527

[9] Distributed Multiagent Reinforcement Learning With Action Networks for Dynamic Economic Dispatch [J].

Hu, Chengfang ;

Wen, Guanghui ;

Wang, Shuai ;

Fu, Junjie ;

Yu, Wenwu .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) :9553-9564

[10] Safe Reinforcement Learning for Model-Reference Trajectory Tracking of Uncertain Autonomous Vehicles With Model-Based Acceleration [J].

Hu, Yifan ;

Fu, Junjie ;

Wen, Guanghui .

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (03) :2332-2344

← 1 2 3 →