Model-Free Q-Learning for the Tracking Problem of Linear Discrete-Time Systems

被引：13

作者：

Li, Chun ^{[1
]}

Ding, Jinliang ^{[1
]}

Lewis, Frank L. ^{[2
]}

Chai, Tianyou ^{[1
]}

机构：

[1] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China

[2] Univ Texas Arlington, Res Inst, Ft Worth, TX 76118 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Desired control input; iterative criteria; model-free; Q-learning; tracking problem; OPTIMAL OUTPUT REGULATION; ADAPTIVE OPTIMAL-CONTROL;

D O I：

10.1109/TNNLS.2022.3195357

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, a model-free Q-learning algorithm is proposed to solve the tracking problem of linear discrete-time systems with completely unknown system dynamics. To eliminate tracking errors, a performance index of the Q-learning approach is formulated, which can transform the tracking problem into a regulation one. Compared with the existing adaptive dynamic programming (ADP) methods and Q-learning approaches, the proposed performance index adds a product term composed of a gain matrix and the reference tracking trajectory to the control input quadratic form. In addition, without requiring any prior knowledge of the dynamics of the original controlled system and command generator, the control policy obtained by the proposed approach can be deduced by an iterative technique relying on the online information of the system state, the control input, and the reference tracking trajectory. In each iteration of the proposed method, the desired control input can he updated by the iterative criteria derived from a precondition of the controlled system and the reference tracking trajectory, which ensures that the obtained control policy can eliminate tracking errors in theory. Moreover, to effectively use less data to obtain the optimal control policy, the off-policy approach is introduced into the proposed algorithm. Finally, the effectiveness of the proposed algorithm is verified by a numerical simulation.

引用

页码：3191 / 3201

页数：11

共 43 条

[1] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[2] Optimal Output Regulation for Square, Overactuated and Underactuated Linear Systems [J].

Bernhard, Sebastian ;

Adamy, Juergen .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (10) :4416-4423

[3] Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems [J].

Chen, Ci ;

Lewis, Frank L. ;

Xie, Kan ;

Xie, Shengli ;

Liu, Yilu .

AUTOMATICA, 2020, 119

[4] Reinforcement Learning-Based Adaptive Optimal Exponential Tracking Control of Linear Systems With Unknown Dynamics [J].

Chen, Ci ;

Modares, Hamidreza ;

Xie, Kan ;

Lewis, Frank L. ;

Wan, Yan ;

Xie, Shengli .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (11) :4423-4438

[5] Consensusability and Global Optimality of Discrete-Time Linear Multiagent Systems [J].

Feng, Tao ;

Zhang, Jilie ;

Tong, Yin ;

Zhang, Huaguang .

IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) :8227-8238

[6] INTERNAL MODEL PRINCIPLE FOR LINEAR-MULTIVARIABLE REGULATORS [J].

FRANCIS, BA ;

WONHAM, WM .

APPLIED MATHEMATICS AND OPTIMIZATION, 1975, 2 (02) :170-194

[7] Reinforcement Learning-Based Cooperative Optimal Output Regulation via Distributed Adaptive Internal Model [J].

Gao, Weinan ;

Mynuddin, Mohammed ;

Wunsch, Donald C. ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) :5229-5240

[8] Leader-to-Formation Stability of Multiagent Systems: An Adaptive Optimal Control Approach [J].

Gao, Weinan ;

Jiang, Zhong-Ping ;

Lewis, Frank L. ;

Wang, Yebin .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (10) :3581-3587

[9] Adaptive Optimal Output Regulation of Time-Delay Systems via Measurement Feedback [J].

Gao, Weinan ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :938-945

[10] Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems [J].

Gao, Weinan ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4164-4169

← 1 2 3 4 5 →