HPo tracking control for perturbed discrete-time systems using On/Off policy Q-learning algorithms

被引:1
作者
Dao, Phuong Nam [1 ]
Dao, Quang Huy [1 ]
机构
[1] Hanoi Univ Sci & Technol, Sch Elect & Elect Engn, Hanoi, Vietnam
关键词
Perturbed discrete-time systems; Q-learning; On/off policy algorithm; Model-free control; Reinforcement learning control; ADAPTIVE OPTIMAL-CONTROL;
D O I
10.1016/j.chaos.2025.116459
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The widely studied HPo zero-sum game problem guarantees the integration of external disturbance into the optimal control problem. In this article, two model-free Q-learning algorithms based on HPo tracking control are proposed for perturbed discrete-time systems in the presence of external disturbance. Moreover, modification of the output optimal control problem is also made. For the optimal tracking control problem, the existence of a discount factor is necessary to guarantee the final value of the cost function, and the Ricatti equation is modified. With the aid of the deviation between Q functions at two consecutive times and the original principle of Off/On policy, the consideration of HPo zero-sum game problem, two On/Off Q-learning algorithms based on HPo tracking control are proposed. Then, by computing the Q function, the influence of probing noise on the Q function is considered. The analysis of solution equivalence proves that convergence and tracking are guaranteed in the proposed algorithm. Eventually, simulation studies are carried out on F-16 aircraft to assess the validity of the presented control schemes.
引用
收藏
页数:20
相关论文
共 46 条
[11]   Off-policy two-dimensional reinforcement learning for optimal tracking control of batch processes with network-induced dropout and disturbances [J].
Jiang, Xueying ;
Huang, Min ;
Shi, Huiyuan ;
Wang, Xingwei ;
Zhang, Yanfeng .
ISA TRANSACTIONS, 2024, 144 :228-244
[12]   Reinforcement learning and cooperative H? output regulation of linear continuous-time multi-agent systems [J].
Jiang, Yi ;
Gao, Weinan ;
Wu, Jin ;
Chai, Tianyou ;
Lewis, Frank L. .
AUTOMATICA, 2023, 148
[13]   H∞ control of linear discrete-time systems: Off-policy reinforcement learning [J].
Kiumarsi, Bahare ;
Lewis, Frank L. ;
Jiang, Zhong-Ping .
AUTOMATICA, 2017, 78 :144-152
[14]   Optimal dynamic Control Allocation with guaranteed constraints and online Reinforcement Learning [J].
Kolaric, Patrik ;
Lopez, Victor G. ;
Lewis, Frank L. .
AUTOMATICA, 2020, 122
[15]   Compensator-Based Self-Learning: Optimal Operational Control for Two-Time-Scale Systems With Input Constraints [J].
Li, Jinna ;
Yang, Mingwei ;
Lewis, Frank L. ;
Zheng, Meng .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (07) :9465-9475
[16]   NN-Based Reinforcement Learning Optimal Control for Inequality-Constrained Nonlinear Discrete-Time Systems With Disturbances [J].
Li, Shu ;
Ding, Liang ;
Zheng, Miao ;
Liu, Zixuan ;
Li, Xinyu ;
Yang, Huaiguang ;
Gao, Haibo ;
Deng, Zongquan .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) :15507-15516
[17]   Static-Output-Feedback Based Robust Fuzzy Wheelbase Preview Control for Uncertain Active Suspensions With Time Delay and Finite Frequency Constraint [J].
Li, Wenfeng ;
Xie, Zhengchao ;
Zhao, Jing ;
Wong, Pak Kin ;
Wang, Hui ;
Wang, Xiaowei .
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2021, 8 (03) :664-678
[18]   Policy Optimization Adaptive Dynamic Programming for Optimal Control of Input-Affine Discrete-Time Nonlinear Systems [J].
Lin, Mingduo ;
Zhao, Bo .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (07) :4339-4350
[19]   Finite-Horizon Optimal Control for Nonlinear Multi-Input Systems With Online Adaptive Integral Reinforcement Learning [J].
Lv, Yongfeng ;
Zhang, Wan ;
Zhao, Jun ;
Zhao, Xiaowei .
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 :802-812
[20]   Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning [J].
Ma, Xiao ;
Yuan, Yuan .
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2024, 361 (07)