Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis

被引:0
|
作者
Lin, Mingduo [1 ]
Zhao, Bo [1 ,2 ]
Liu, Derong [3 ,4 ]
机构
[1] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Key Lab Ind Internet Things & Networked Control, Minist Educ, Chongqing 400065, Peoples R China
[3] Southern Univ Sci & Technol, Sch Syst Design & Intelligent Mfg, Shenzhen 518055, Peoples R China
[4] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL 60607 USA
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming (ADP); data-based control; optimal control; output tracking control; policy optimization (PO); reinforcement learning (RL); GRADIENT METHODS; LINEAR-SYSTEMS; TIME-SYSTEMS;
D O I
10.1109/TNNLS.2024.3379207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Optimal learning output tracking control (OLOTC) in a model-free manner has received increasing attention in both the intelligent control and the reinforcement learning (RL) communities. Although the model-free tracking control has been achieved via off-policy learning and $Q$ -learning, another popular RL idea of direct policy learning, with its easy-to-implement feature, is still rarely considered. To fill this gap, this article aims to develop a novel model-free policy optimization (PO) algorithm to achieve the OLOTC for unknown linear discrete-time (DT) systems. The iterative control policy is parameterized to directly improve the discounted value function of the augmented system via the gradient-based method. To implement this algorithm in a model-free manner, a model-free two-point policy gradient (PG) algorithm is designed to approximate the gradient of discounted value function by virtue of the sampled states and the reference trajectories. The global convergence of model-free PO algorithm to the optimal value function is demonstrated with the sufficient quantity of samples and proper conditions. Finally, numerical simulation results are provided to validate the effectiveness of the present method.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [31] A hierarchical learning control framework for tracking tasks, based on model-free principles
    Radac, Mircea-Bogdan
    Negru, Vlad
    Precup, Radu-Emil
    2019 23RD INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2019, : 200 - 205
  • [32] Trajectory Tracking Control for Parafoil Systems Based on the Model-Free Adaptive Control Method
    Zhao, Linggong
    He, Weiliang
    Lv, Feikai
    Wang, Xiaoguang
    IEEE ACCESS, 2020, 8 : 152620 - 152636
  • [33] Model-Free Trajectory Optimization for Reinforcement Learning
    Akrour, Riad
    Abdolmaleki, Abbas
    Abdulsamad, Hany
    Neumann, Gerhard
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [34] Model-free optimal tracking control for discrete-time system with delays using reinforcement Q-learning
    Liu, Yang
    Yu, Rui
    ELECTRONICS LETTERS, 2018, 54 (12) : 750 - 751
  • [35] Quantized measurements in Q-learning based model-free optimal control
    Tiistola, Sini
    Ritala, Risto
    Vilkko, Matti
    IFAC PAPERSONLINE, 2020, 53 (02): : 1640 - 1645
  • [36] Model-free PAC Time-Optimal Control Synthesis with Reinforcement Learning
    Liu, Mengyu
    Lu, Pengyuan
    Chen, Xin
    Sokolsky, Oleg
    Lee, Insup
    Kong, Fanxin
    2024 22ND ACM-IEEE INTERNATIONAL SYMPOSIUM ON FORMAL METHODS AND MODELS FOR SYSTEM DESIGN, MEMOCODE 2024, 2024, : 34 - 45
  • [37] Model-free optimal chiller loading method based on Q-learning
    Qiu, Shunian
    Li, Zhenhai
    Li, Zhengwei
    Zhang, Xinfang
    SCIENCE AND TECHNOLOGY FOR THE BUILT ENVIRONMENT, 2020, 26 (08) : 1100 - 1116
  • [38] Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization
    Dong, Kun
    Luo, Yongle
    Wang, Yuxin
    Liu, Yu
    Qu, Chengeng
    Zhang, Qiang
    Cheng, Erkang
    Sun, Zhiyong
    Song, Bo
    KNOWLEDGE-BASED SYSTEMS, 2024, 287
  • [39] Hybrid-based model-free iterative learning control with optimal performance
    Kou, Zhicheng
    Sun, Jinggao
    Su, Guanghao
    Wang, Meng
    Yan, Huaicheng
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2023, 54 (10) : 2268 - 2280
  • [40] Online Model-Free Reinforcement Learning for Output Feedback Tracking Control of a Class of Discrete-Time Systems With Input Saturation
    Al-Mahasneh, Ahmad Jobran
    Anavatti, Sreenatha G.
    Garratt, Matthew A.
    IEEE ACCESS, 2022, 10 : 104966 - 104979