HPo tracking control for perturbed discrete-time systems using On/Off policy Q-learning algorithms

被引:1
作者
Dao, Phuong Nam [1 ]
Dao, Quang Huy [1 ]
机构
[1] Hanoi Univ Sci & Technol, Sch Elect & Elect Engn, Hanoi, Vietnam
关键词
Perturbed discrete-time systems; Q-learning; On/off policy algorithm; Model-free control; Reinforcement learning control; ADAPTIVE OPTIMAL-CONTROL;
D O I
10.1016/j.chaos.2025.116459
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The widely studied HPo zero-sum game problem guarantees the integration of external disturbance into the optimal control problem. In this article, two model-free Q-learning algorithms based on HPo tracking control are proposed for perturbed discrete-time systems in the presence of external disturbance. Moreover, modification of the output optimal control problem is also made. For the optimal tracking control problem, the existence of a discount factor is necessary to guarantee the final value of the cost function, and the Ricatti equation is modified. With the aid of the deviation between Q functions at two consecutive times and the original principle of Off/On policy, the consideration of HPo zero-sum game problem, two On/Off Q-learning algorithms based on HPo tracking control are proposed. Then, by computing the Q function, the influence of probing noise on the Q function is considered. The analysis of solution equivalence proves that convergence and tracking are guaranteed in the proposed algorithm. Eventually, simulation studies are carried out on F-16 aircraft to assess the validity of the presented control schemes.
引用
收藏
页数:20
相关论文
共 46 条
[1]   Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design [J].
Bian, Tao ;
Jiang, Zhong-Ping .
AUTOMATICA, 2016, 71 :348-360
[2]   Data-Based Robust Adaptive Dynamic Programming for Balancing Control Performance and Energy Consumption in Wastewater Treatment Process [J].
Cao, Weiwei ;
Yang, Qinmin ;
Meng, Wenchao ;
Xie, Shuzong .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (04) :6622-6630
[3]   Adaptive Optimal Control of Unknown Nonlinear Systems via Homotopy-Based Policy Iteration [J].
Chen, Ci ;
Lewis, Frank L. ;
Xie, Kan ;
Xie, Shengli .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (05) :3396-3403
[4]   A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system [J].
Chen, Zhe ;
Xue, Wenqian ;
Li, Ning ;
Lian, Bosen ;
Lewis, Frank L. .
NONLINEAR DYNAMICS, 2022, 107 (03) :2563-2582
[5]   Adaptive prescribed performance second-order sliding mode tracking control of autonomous underwater vehicle using neural network-based disturbance observer [J].
Ding, Zhongjun ;
Wang, Haipeng ;
Sun, Yanchao ;
Qin, Hongde .
OCEAN ENGINEERING, 2022, 260
[6]   Optimal control of a two-wheeled self-balancing robot by reinforcement learning [J].
Guo, Linyuan ;
Rizvi, Syed Ali Asad ;
Lin, Zongli .
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2021, 31 (06) :1885-1904
[7]   Robust control for affine nonlinear systems under the reinforcement learning framework [J].
Guo, Wenxin ;
Qin, Weiwei ;
Lan, Xuguang ;
Liu, Jieyu ;
Zhang, Zhaoxiang .
NEUROCOMPUTING, 2024, 587
[9]   Adaptive integral-sliding-mode control strategy for maneuvering control of F16 aircraft subject to aerodynamic uncertainty [J].
Ijaz, Salman ;
Chen Fuyang ;
Hamayun, Mirza Tariq ;
Anwaar, Haris .
APPLIED MATHEMATICS AND COMPUTATION, 2021, 402
[10]   Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems [J].
Jiang, Huaiyuan ;
Zhou, Bin .
AUTOMATICA, 2022, 136