Linear quadratic optimal control method based on output feedback inverse reinforcement Q-learning

被引:0
作者
Liu, Wen [1 ]
Fan, Jia-Lu [1 ]
Xue, Wen-Qian [1 ]
机构
[1] State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Liaoning, Shenyang
来源
Kongzhi Lilun Yu Yingyong/Control Theory and Applications | 2024年 / 41卷 / 08期
基金
中国国家自然科学基金;
关键词
data-driven optimal control; inverse reinforcement learning; output feedback; Q-learning;
D O I
10.7641/CTA.2023.20551
中图分类号
学科分类号
摘要
In this paper, a data-driven output feedback optimal control method using inverse reinforcement Q-learning for linear quadratic optimal control problem of linear discrete–time systems with unknown model parameters and unmeasurable states is proposed. Only input and output data are used to adaptively determine the values of appropriate quadratic performance index weights and optimal control law, so that the system exhibits the same trajectories as the reference trajectories. Firstly, an equation for parameter correction is proposed, by combining which with inverse optimal control, a model-based inverse reinforcement learning based optimal control method framework is proposed to compute the correction of the output feedback control law and performance index weights. On this basis, this paper introduces the idea of reinforcement Q-learning and a data-driven output feedback inverse reinforcement Q-learning optimal control method is eventually proposed, which does not require system model parameters, but uses only historical input and output data to solve output feedback control law parameter and performance index weights. The theoretical analysis and simulation experiments are provided to verify the effectiveness of the proposed method. © 2024 South China University of Technology. All rights reserved.
引用
收藏
页码:1469 / 1479
页数:10
相关论文
共 39 条
  • [31] XUE W Q, LIAN B S, FAN J L, Et al., Inverse reinforcement Q-learning through expert imitation for discrete-time systems, IEEE Transactions on Neural Networks and Learning Systems, 34, 5, pp. 2386-2399, (2023)
  • [32] VALADBEIGI A P, SEDIGH A K, LEWIS F L., H∞ static output-feedback control design for discrete-time systems using reinforcement learning, IEEE Transactions on Neural Networks and Learning Systems, 31, 2, pp. 396-406, (2020)
  • [33] CHEN S K, XIAO Z F, LI J N., Data-driven Optimal Tracking Cntrol for Linear Systems based on Output Feedback Approach, (2020)
  • [34] LI Zhen, FAN Jialu, JIANG Yi, Et al., A model-free H∞ control method based on Off-Policy with output data feedback, Acta Automatica Sinica, 47, 9, pp. 2182-2193, (2021)
  • [35] GAO W N, JIANG Y, JIANG Z P, Et al., Adaptive and optimal output feedback control of linear systems: An adaptive dynamic programming approach, Intelligent Control and Automation IEEE, pp. 2085-2090, (2014)
  • [36] LANCASTER P, RODMAN L., Algebraic Riccati Equations, (1995)
  • [37] AL-TAMIMI A, LEWIS F L, ABU-KHALAF M., Model-free Qlearning designs for linear discrete-time zero-sum games with application to H-infinity control, Automatica, 43, 3, pp. 473-481, (2007)
  • [38] KIUMARSI B, LEWIS F L, NAGHIBI-SISTANI M B, Et al., Optimal tracking control of unknown discrete-time linear systems using input-output measured data, IEEE Transactions on Cybernetics, 45, 12, pp. 2770-2779, (2015)
  • [39] MOMBAUR K, TRUONG A, LAUMOND J P., From human to humanoid locomotion―An inverse optimal control approach, Auton Robots, 28, pp. 369-383, (2010)