Based on off-policy Q-learning: Optimal tracking control for unknown linear discrete-time systems under deception attacks

被引:0
作者
Song, Xing-Xing [1 ]
Chu, Zhao-Bi [1 ]
机构
[1] College of Electrical and Automation Engineering, Hefei University of Technology, Hefei
来源
Kongzhi yu Juece/Control and Decision | 2025年 / 40卷 / 05期
关键词
deception attacks; off-policy Q-learning; optimal tracking; zero-sum game;
D O I
10.13195/j.kzyjc.2024.0830
中图分类号
学科分类号
摘要
An off-policy Q-learning algorithm is proposed to solve the optimal tracking control problem for the linear discrete-time system with unknown dynamics information under multiple deception attack. Firstly, a weight matrix is added to establish the input model of multiple deception attacks on the controller communication channel, and an augmented tracking system is constructed with a reference command generator. In the framework of linear quadratic tracking, the optimal tracking control of the system is expressed as a zero-sum game problem between deception attacks and control inputs. Then, an off-policy Q-learning algorithm based on state data is designed to learn the optimal tracking control gain of the system, which solves the problem that the control gain is difficult to update according to the given requirements in applications. It is proved that the solution of the algorithm has no deviation under the probe noise satisfying the persistence of excitation condition. At the same time, considering the situation that the system state cannot be measured, an off-policy Q-learning algorithm based on output data is designed. Finally, through the tracking control simulation of F-16 aircraft autopilot, the effectiveness of the designed off-policy Q-learning algorithm and the unbiasedness effect on detection noise are verified. © 2025 Northeast University. All rights reserved.
引用
收藏
页码:1641 / 1650
页数:9
相关论文
共 19 条
[1]  
Mu C X, Ni Z, Sun C Y, Et al., Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming, IEEE Transactions on Neural Networks and Learning Systems, 28, 3, pp. 584-598, (2017)
[2]  
Rosaline A D, Somarajan U., Structured H<sub>∞</sub> controller for an uncertain deregulated power system, IEEE Transactions on Industry Applications, 55, 1, pp. 892-906, (2019)
[3]  
Zhao S Y, Yan Z, Meng Q X, Et al., Model-free tracking control of pneumatic bellow actuator based on broad learning system, Control and Decision, 39, 1, pp. 121-128, (2024)
[4]  
Kiumarsi B, Lewis F L, Modares H, Et al., Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics, Automatica, 50, 4, pp. 1167-1175, (2014)
[5]  
Ali Asad Rizvi S, Pertzborn A J, Lin Z L., Reinforcement learning based optimal tracking control under unmeasurable disturbances with application to HVAC systems, IEEE Transactions on Neural Networks and Learning Systems, 33, 12, pp. 7523-7533, (2022)
[6]  
Xu Y, Yang C Y, Zhou L N, Et al., Adaptive event-triggered synchronization of neural networks under stochastic cyber-attacks with application to Chua’s circuit, Neural Networks, 166, pp. 11-21, (2023)
[7]  
Wang L, Huang F., An interdisciplinary survey of multi-agent games, learning, and control, Acta Automatica Sinica, 49, 3, pp. 580-613, (2023)
[8]  
Zhang L, Fan J L, Xue W Q, Et al., Data-driven H<sub>∞</sub> optimal output feedback control for linear discrete-time systems based on off-policy Q-learning, IEEE Transactions on Neural Networks and Learning Systems, 34, 7, pp. 3553-3567, (2023)
[9]  
Kong L H, He W, Dong Y T, Et al., Asymmetric bounded neural control for an uncertain robot by state feedback and output feedback, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51, 3, pp. 1735-1746, (2021)
[10]  
Lewis F L, Vamvoudakis K G., Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 41, 1, pp. 14-25, (2011)