HPo tracking control for perturbed discrete-time systems using On/Off policy Q-learning algorithms

被引：1

作者：

Dao, Phuong Nam ^{[1
]}

Dao, Quang Huy ^{[1
]}

机构：

[1] Hanoi Univ Sci & Technol, Sch Elect & Elect Engn, Hanoi, Vietnam

来源：

CHAOS SOLITONS & FRACTALS | 2025年 / 197卷

关键词：

Perturbed discrete-time systems; Q-learning; On/off policy algorithm; Model-free control; Reinforcement learning control; ADAPTIVE OPTIMAL-CONTROL;

D O I：

10.1016/j.chaos.2025.116459

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

The widely studied HPo zero-sum game problem guarantees the integration of external disturbance into the optimal control problem. In this article, two model-free Q-learning algorithms based on HPo tracking control are proposed for perturbed discrete-time systems in the presence of external disturbance. Moreover, modification of the output optimal control problem is also made. For the optimal tracking control problem, the existence of a discount factor is necessary to guarantee the final value of the cost function, and the Ricatti equation is modified. With the aid of the deviation between Q functions at two consecutive times and the original principle of Off/On policy, the consideration of HPo zero-sum game problem, two On/Off Q-learning algorithms based on HPo tracking control are proposed. Then, by computing the Q function, the influence of probing noise on the Q function is considered. The analysis of solution equivalence proves that convergence and tracking are guaranteed in the proposed algorithm. Eventually, simulation studies are carried out on F-16 aircraft to assess the validity of the presented control schemes.

引用

页数：20

共 46 条

[1] Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design [J].