Dynamic compensator-based near-optimal control for unknown nonaffine systems via integral reinforcement learning

被引:8
作者
Lin, Jinquan [1 ]
Zhao, Bo [2 ]
Liu, Derong [3 ,4 ]
Wang, Yonghua [1 ]
机构
[1] Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Peoples R China
[2] Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China
[3] Southern Univ Sci & Technol, Sch Syst Design & Intelligent Mfg, Shenzhen 518055, Peoples R China
[4] Univ Illinois, Dept Elect & Comp Engn, Chicago, IL 60607 USA
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Neuro-dynamic programming; Adaptive dynamic programming; Reinforcement learning; Optimal control; Neural networks; Dynamic compensator; CONTINUOUS-TIME; EXPERIENCE REPLAY; DESIGN; ALGORITHM;
D O I
10.1016/j.neucom.2023.126973
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a dynamic compensator-based near-optimal control approach for unknown nonaffine nonlinear systems is developed by using integral reinforcement learning. Since system dynamics is unknown, it is difficult to obtain the optimal control policy via neuro-dynamic programming. To address this problem, a general dynamic compensator is introduced as the virtual control input to augment the unknown nonaffine nonlinear system as a partially unknown affine system. For the augmented system, a novel quadratic value function is designed with the system states, the actual control input and the virtual control input. The optimal control of the augmented system can be regarded as the near-optimal control for the original system since the novel optimal value function is an upper bound of the original optimal value function. In order to avoid the identification of system dynamics, the integral reinforcement learning framework is utilized to derive the optimal control based on the solution of Hamilton-Jacobi-Bellman equation via the critic-only structure. Meanwhile, the weight learning rule of the critic neural network is presented with the experience replay technique to relax the persistence of excitation condition. Moreover, the uniform ultimate boundedness of weight estimation errors and the stability of the closed-loop system are guaranteed by using the Lyapunov's direct method. Finally, simulation results of two examples demonstrate the effectiveness of the developed dynamic compensator-based near-optimal control method.
引用
收藏
页数:9
相关论文
共 42 条
  • [1] Bounded robust control of nonlinear systems using neural network-based HJB solution
    Adhyaru, Dipak M.
    Kar, I. N.
    Gopal, M.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2011, 20 (01) : 91 - 103
  • [2] Cox C, 1999, INT J ROBUST NONLIN, V9, P1071, DOI 10.1002/(SICI)1099-1239(19991215)9:14<1071::AID-RNC453>3.0.CO
  • [3] 2-W
  • [4] Reinforcement learning in continuous time and space
    Doya, K
    [J]. NEURAL COMPUTATION, 2000, 12 (01) : 219 - 245
  • [5] Hanbing Dan, 2021, CES Transactions on Electrical Machines and Systems, V5, P90, DOI 10.30941/CESTEMS.2021.00012
  • [6] H-infinity Control of Nonaffine Aerial Systems Using Off-policy Reinforcement
    Kiumarsi, Bahare
    Kang, Wei
    Lewis, Frank L.
    [J]. UNMANNED SYSTEMS, 2016, 4 (01) : 51 - 60
  • [7] Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics
    Li, Hongliang
    Liu, Derong
    Wang, Ding
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2014, 11 (03) : 706 - 714
  • [8] Off-Policy Q-Learning: Set-Point Design for Optimizing Dual-Rate Rougher Flotation Operational Processes
    Li, Jinna
    Chai, Tianyou
    Lewis, Frank L.
    Fan, Jialu
    Ding, Zhengtao
    Ding, Jinliang
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2018, 65 (05) : 4092 - 4102
  • [9] Robust Output Regulation of Linear Systems by Event-Triggered Dynamic Output Feedback Control
    Liang, Dong
    Huang, Jie
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (05) : 2415 - 2422
  • [10] Policy Gradient Adaptive Critic Designs for Model-Free Optimal Tracking Control With Experience Replay
    Lin, Mingduo
    Zhao, Bo
    Liu, Derong
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (06): : 3692 - 3703