Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis

被引：0

作者：

Lin, Mingduo ^{[1
]}

Zhao, Bo ^{[1
,2
]}

Liu, Derong ^{[3
,4
]}

机构：

[1] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China

[2] Chongqing Univ Posts & Telecommun, Key Lab Ind Internet Things & Networked Control, Minist Educ, Chongqing 400065, Peoples R China

[3] Southern Univ Sci & Technol, Sch Syst Design & Intelligent Mfg, Shenzhen 518055, Peoples R China

[4] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL 60607 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年

基金：

中国国家自然科学基金;

关键词：

Adaptive dynamic programming (ADP); data-based control; optimal control; output tracking control; policy optimization (PO); reinforcement learning (RL); GRADIENT METHODS; LINEAR-SYSTEMS; TIME-SYSTEMS;

D O I：

10.1109/TNNLS.2024.3379207

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Optimal learning output tracking control (OLOTC) in a model-free manner has received increasing attention in both the intelligent control and the reinforcement learning (RL) communities. Although the model-free tracking control has been achieved via off-policy learning and $Q$ -learning, another popular RL idea of direct policy learning, with its easy-to-implement feature, is still rarely considered. To fill this gap, this article aims to develop a novel model-free policy optimization (PO) algorithm to achieve the OLOTC for unknown linear discrete-time (DT) systems. The iterative control policy is parameterized to directly improve the discounted value function of the augmented system via the gradient-based method. To implement this algorithm in a model-free manner, a model-free two-point policy gradient (PG) algorithm is designed to approximate the gradient of discounted value function by virtue of the sampled states and the reference trajectories. The global convergence of model-free PO algorithm to the optimal value function is demonstrated with the sufficient quantity of samples and proper conditions. Finally, numerical simulation results are provided to validate the effectiveness of the present method.

引用

页码：1 / 12

页数：12

共 50 条

[31] A hierarchical learning control framework for tracking tasks, based on model-free principles
Radac, Mircea-Bogdan
Negru, Vlad
Precup, Radu-Emil
2019 23RD INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2019, : 200 - 205
[32] Trajectory Tracking Control for Parafoil Systems Based on the Model-Free Adaptive Control Method
Zhao, Linggong
He, Weiliang
Lv, Feikai
Wang, Xiaoguang
IEEE ACCESS, 2020, 8 : 152620 - 152636
[33] Model-Free Trajectory Optimization for Reinforcement Learning
Akrour, Riad
Abdolmaleki, Abbas
Abdulsamad, Hany
Neumann, Gerhard
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[34] Model-free optimal tracking control for discrete-time system with delays using reinforcement Q-learning
Liu, Yang
Yu, Rui
ELECTRONICS LETTERS, 2018, 54 (12) : 750 - 751
[35] Quantized measurements in Q-learning based model-free optimal control
Tiistola, Sini
Ritala, Risto
Vilkko, Matti
IFAC PAPERSONLINE, 2020, 53 (02): : 1640 - 1645
[36] Model-free PAC Time-Optimal Control Synthesis with Reinforcement Learning
Liu, Mengyu
Lu, Pengyuan
Chen, Xin
Sokolsky, Oleg
Lee, Insup
Kong, Fanxin
2024 22ND ACM-IEEE INTERNATIONAL SYMPOSIUM ON FORMAL METHODS AND MODELS FOR SYSTEM DESIGN, MEMOCODE 2024, 2024, : 34 - 45
[37] Model-free optimal chiller loading method based on Q-learning
Qiu, Shunian
Li, Zhenhai
Li, Zhengwei
Zhang, Xinfang
SCIENCE AND TECHNOLOGY FOR THE BUILT ENVIRONMENT, 2020, 26 (08) : 1100 - 1116
[38] Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization
Dong, Kun
Luo, Yongle
Wang, Yuxin
Liu, Yu
Qu, Chengeng
Zhang, Qiang
Cheng, Erkang
Sun, Zhiyong
Song, Bo
KNOWLEDGE-BASED SYSTEMS, 2024, 287
[39] Hybrid-based model-free iterative learning control with optimal performance
Kou, Zhicheng
Sun, Jinggao
Su, Guanghao
Wang, Meng
Yan, Huaicheng
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2023, 54 (10) : 2268 - 2280
[40] Online Model-Free Reinforcement Learning for Output Feedback Tracking Control of a Class of Discrete-Time Systems With Input Saturation
Al-Mahasneh, Ahmad Jobran
Anavatti, Sreenatha G.
Garratt, Matthew A.
IEEE ACCESS, 2022, 10 : 104966 - 104979

← 1 2 3 4 5 →