Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis

被引:0
|
作者
Lin, Mingduo [1 ]
Zhao, Bo [1 ,2 ]
Liu, Derong [3 ,4 ]
机构
[1] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Key Lab Ind Internet Things & Networked Control, Minist Educ, Chongqing 400065, Peoples R China
[3] Southern Univ Sci & Technol, Sch Syst Design & Intelligent Mfg, Shenzhen 518055, Peoples R China
[4] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL 60607 USA
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming (ADP); data-based control; optimal control; output tracking control; policy optimization (PO); reinforcement learning (RL); GRADIENT METHODS; LINEAR-SYSTEMS; TIME-SYSTEMS;
D O I
10.1109/TNNLS.2024.3379207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Optimal learning output tracking control (OLOTC) in a model-free manner has received increasing attention in both the intelligent control and the reinforcement learning (RL) communities. Although the model-free tracking control has been achieved via off-policy learning and $Q$ -learning, another popular RL idea of direct policy learning, with its easy-to-implement feature, is still rarely considered. To fill this gap, this article aims to develop a novel model-free policy optimization (PO) algorithm to achieve the OLOTC for unknown linear discrete-time (DT) systems. The iterative control policy is parameterized to directly improve the discounted value function of the augmented system via the gradient-based method. To implement this algorithm in a model-free manner, a model-free two-point policy gradient (PG) algorithm is designed to approximate the gradient of discounted value function by virtue of the sampled states and the reference trajectories. The global convergence of model-free PO algorithm to the optimal value function is demonstrated with the sufficient quantity of samples and proper conditions. Finally, numerical simulation results are provided to validate the effectiveness of the present method.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [41] Track tracking control of sanitation vehicle based on model-free adaptive iterative learning control
    Yao W.-L.
    Pang Z.
    Chi R.-H.
    Shao W.
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2022, 39 (01): : 101 - 108
  • [42] Redefined Output Model-Free Adaptive Control Method and Unmanned Surface Vehicle Heading Control
    Liao, Yulei
    Jiang, Quanquan
    Du, Tingpeng
    Jiang, Wen
    IEEE JOURNAL OF OCEANIC ENGINEERING, 2020, 45 (03) : 714 - 723
  • [43] Off-Policy: Model-Free Optimal Synchronization Control for Complex Dynamical Networks
    Jianfeng Wang
    Yan Wang
    Zhicheng Ji
    Neural Processing Letters, 2022, 54 : 2941 - 2958
  • [44] Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey
    Liu, Yongshuai
    Halev, Avishai
    Liu, Xin
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4508 - 4515
  • [45] Off-Policy: Model-Free Optimal Synchronization Control for Complex Dynamical Networks
    Wang, Jianfeng
    Wang, Yan
    Ji, Zhicheng
    NEURAL PROCESSING LETTERS, 2022, 54 (04) : 2941 - 2958
  • [46] Model-Free δ-Policy Iteration Based on Damped Newton Method for Nonlinear Continuous-Time H∞ Tracking Control
    Wang, Qi
    arXiv, 2024,
  • [47] Model-free fuzzy tracking control of a nuclear reactor
    Marseguerra, M
    Zio, E
    ANNALS OF NUCLEAR ENERGY, 2003, 30 (09) : 953 - 981
  • [48] Cascaded Model-Free Control for trajectory tracking of quadrotors
    Bekcheva, Maria
    Join, Cedric
    Mounier, Hugues
    2018 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS), 2018, : 1359 - 1368
  • [49] Three-level hierarchical model-free learning approach to trajectory tracking control
    Radac, Mircea-Bogdan
    Precup, Radu-Emil
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 55 : 103 - 118
  • [50] Model-free tracking control of pneumatic bellow actuator based on broad learning system
    Zhao S.-Y.
    Yan Z.
    Meng Q.-X.
    Xiao H.
    Lai X.-Z.
    Wu M.
    Kongzhi yu Juece/Control and Decision, 2024, 39 (01): : 121 - 128