Data-Driven $H_{∞}$ Optimal Output Feedback Control for Linear Discrete-Time Systems Based on Off-Policy Q-Learning

被引:22
|
作者
Zhang, Li [1 ,2 ]
Fan, Jialu [1 ,2 ]
Xue, Wenqian [1 ,2 ]
Lopez, Victor G. [3 ]
Li, Jinna [4 ]
Chai, Tianyou [1 ,2 ]
Lewis, Frank L. [5 ]
机构
[1] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
[2] Northeastern Univ, Int Joint Res Lab Integrated Automat, Shenyang 110819, Peoples R China
[3] Leibniz Univ Hannover, D-30167 Hannover, Germany
[4] Liaoning Petrochem Univ, Sch Informat & Control Engn, Fushun 113001, Peoples R China
[5] Univ Texas Arlington, UTA Res Inst, Arlington, TX 76118 USA
关键词
Heuristic algorithms; Optimal control; Transmission line matrix methods; Process control; Performance analysis; Output feedback; Games; H∞ control; off-policy Q-learning; Q-learning; static output feedback (OPFB); zero-sum game; H-INFINITY CONTROL; ZERO-SUM GAMES; ADAPTIVE OPTIMAL-CONTROL; ALGORITHM; DESIGN;
D O I
10.1109/TNNLS.2021.3112457
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article develops two novel output feedback (OPFB) Q-learning algorithms, on-policy Q-learning and off-policy Q-learning, to solve $H_{infinity}$ static OPFB control problem of linear discrete-time (DT) systems. The primary contribution of the proposed algorithms lies in a newly developed OPFB control algorithm form for completely unknown systems. Under the premise of satisfying disturbance attenuation conditions, the conditions for the existence of the optimal OPFB solution are given. The convergence of the proposed Q-learning methods, and the difference and equivalence of two algorithms are rigorously proven. Moreover, considering the effects brought by probing noise for the persistence of excitation (PE), the proposed off-policy Q-learning method has the advantage of being immune to probing noise and avoiding biasedness of solution. Simulation results are presented to verify the effectiveness of the proposed approaches.
引用
收藏
页码:3553 / 3567
页数:15
相关论文
共 50 条
  • [41] Online accelerated data-driven learning for optimal feedback control of discrete-time partially uncertain systems
    Somers, Luke
    Haddad, Wassim M.
    Kokolakis, Nick-Marios T.
    Vamvoudakis, Kyriakos G.
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2024, 38 (03) : 848 - 876
  • [42] Data-driven optimal output regulation for unknown linear discrete-time systems based on parameterization approach
    Zhai, Ganghui
    Tian, Engang
    Luo, Yuqiang
    Liang, Dong
    APPLIED MATHEMATICS AND COMPUTATION, 2024, 461
  • [43] Robust H8 tracking of linear discrete-time systems using Q-learning
    Valadbeigi, Amir Parviz
    Shu, Zhan
    Khaki Sedigh, Ali
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2023, 33 (10) : 5604 - 5623
  • [44] FINITE-HORIZON OPTIMAL CONTROL OF DISCRETE-TIME LINEAR SYSTEMS WITH COMPLETELY UNKNOWN DYNAMICS USING Q-LEARNING
    Zhao, Jingang
    Zhang, Chi
    JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2021, 17 (03) : 1471 - 1483
  • [45] H∞ Tracking Control for Linear Discrete-Time Systems: Model-Free Q-Learning Designs
    Yang, Yunjie
    Wan, Yan
    Zhu, Jihong
    Lewis, Frank L.
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (01): : 175 - 180
  • [46] New Results on Output Feedback H∞ Control for Linear Discrete-Time Systems
    Chang, Xiao-Heng
    Yang, Guang-Hong
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (05) : 1355 - 1359
  • [47] Optimal Output Regulation of Linear Discrete-Time Systems With Unknown Dynamics Using Reinforcement Learning
    Jiang, Yi
    Kiumarsi, Bahare
    Fan, Jialu
    Chai, Tianyou
    Li, Jinna
    Lewis, Frank L.
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (07) : 3147 - 3156
  • [48] Online Adaptive Optimal Control of Discrete-time Linear Systems via Synchronous Q-learning
    Li, Xinxing
    Wang, Xueyuan
    Zha, Wenzhong
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2024 - 2029
  • [49] Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics
    Kiumarsi, Bahare
    Lewis, Frank L.
    Modares, Hamidreza
    Karimpour, Ali
    Naghibi-Sistani, Mohammad-Bagher
    AUTOMATICA, 2014, 50 (04) : 1167 - 1175
  • [50] A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems
    Lin, Mingduo
    Liu, Derong
    Zhao, Bo
    Dai, Qionghai
    Dong, Yi
    2019 9TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST2019), 2019, : 6 - 10