Optimal Output Regulation of Linear Discrete-Time Systems With Unknown Dynamics Using Reinforcement Learning

被引:119
作者
Jiang, Yi [1 ,2 ,3 ]
Kiumarsi, Bahare [4 ]
Fan, Jialu [1 ,2 ]
Chai, Tianyou [1 ,2 ]
Li, Jinna [5 ]
Lewis, Frank L. [1 ,2 ,6 ]
机构
[1] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
[2] Northeastern Univ, Int Joint Res Lab Integrated Automat, Shenyang 110819, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2V4, Canada
[4] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48824 USA
[5] Liaoning Shihua Univ, Sch Informat & Control Engn, Fushun 113001, Peoples R China
[6] Univ Texas Arlington, UTA Res Inst, Arlington, TX 76118 USA
基金
中国国家自然科学基金;
关键词
Optimization; Heuristic algorithms; Mathematical model; System dynamics; Optimal control; Automation; Reinforcement learning; Discrete-time (DT) systems; model-free; optimal output regulation; reinforcement learning (RL); OPTIMAL TRACKING CONTROL; ADAPTIVE OPTIMAL-CONTROL; H-INFINITY CONTROL; ZERO-SUM GAMES; SERVOMECHANISM PROBLEM; ROBUST-CONTROL;
D O I
10.1109/TCYB.2018.2890046
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a model-free optimal approach based on reinforcement learning for solving the output regulation problem for discrete-time systems under disturbances. This problem is first broken down into two optimization problems: 1) a constrained static optimization problem is established to find the solution to the output regulator equations (i.e., the feedforward control input) and 2) a dynamic optimization problem is established to find the optimal feedback control input. Solving these optimization problems requires the knowledge of the system dynamics. To obviate this requirement, a model-free off-policy algorithm is presented to find the solution to the dynamic optimization problem using only measured data. Then, based on the solution to the dynamic optimization problem, a model-free approach is provided for the static optimization problem. It is shown that the proposed algorithm is insensitive to the probing noise added to the control input for satisfying the persistence of excitation condition. Simulation results are provided to verify the effectiveness of the proposed approach.
引用
收藏
页码:3147 / 3156
页数:10
相关论文
共 40 条
[1]   Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].
Al-Tamimi, Asma ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :943-949
[2]   Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].
Al-Tamimi, Asma ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
AUTOMATICA, 2007, 43 (03) :473-481
[3]  
[Anonymous], 2016, THESIS
[4]   ROBUST CONTROL OF A GENERAL SERVOMECHANISM PROBLEM - SERVO COMPENSATOR [J].
DAVISON, EJ ;
GOLDENBERG, A .
AUTOMATICA, 1975, 11 (05) :461-471
[5]   ROBUST CONTROL OF A SERVOMECHANISM PROBLEM FOR LINEAR TIME-INVARIANT MULTIVARIABLE SYSTEMS [J].
DAVISON, EJ .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1976, 21 (01) :25-34
[6]   LINEAR-MULTIVARIABLE REGULATOR PROBLEM [J].
FRANCIS, BA .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1977, 15 (03) :486-505
[7]   Leader-to-Formation Stability of Multiagent Systems: An Adaptive Optimal Control Approach [J].
Gao, Weinan ;
Jiang, Zhong-Ping ;
Lewis, Frank L. ;
Wang, Yebin .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (10) :3581-3587
[8]   Learning-Based Adaptive Optimal Tracking Control of Strict-Feedback Nonlinear Systems [J].
Gao, Weinan ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (06) :2614-2624
[9]   Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems [J].
Gao, Weinan ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4164-4169
[10]   Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming [J].
Gao, Weinan ;
Jiang, Yu ;
Jiang, Zhong-Ping ;
Chai, Tianyou .
AUTOMATICA, 2016, 72 :37-45