Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems

被引:109
作者
Li, Jinna [1 ,2 ]
Chai, Tianyou [1 ,3 ]
Lewis, Frank L. [1 ,3 ,4 ]
Ding, Zhengtao [5 ]
Jiang, Yi [1 ,3 ]
机构
[1] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Liaoning, Peoples R China
[2] Liaoning Shihua Univ, Sch Informat & Control Engn, Fushun 113001, Peoples R China
[3] Northeastern Univ, Int Joint Res Lab Integrated Automat, Shenyang 110819, Liaoning, Peoples R China
[4] Univ Texas Arlington, UTA Res Inst, Arlington, TX 76118 USA
[5] Univ Manchester, Sch Elect & Elect Engn, Manchester M13 9PL, Lancs, England
基金
中国国家自然科学基金;
关键词
Affine nonlinear systems; interleaved learning; off-policy learning; optimal control; Q-learning; H-INFINITY CONTROL; OPTIMAL OPERATIONAL CONTROL; ADAPTIVE OPTIMAL-CONTROL; LINEAR-SYSTEMS; DESIGN; ITERATION; GAMES;
D O I
10.1109/TNNLS.2018.2861945
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel off-policy interleaved Q-learning algorithm is presented for solving optimal control problem of affine nonlinear discrete-time (DT) systems, using only the measured data along the system trajectories. Affine nonlinear feature of systems, unknown dynamics, and off-policy learning approach pose tremendous challenges on approximating optimal controllers. To this end, on-policy Q-learning method for optimal control of affine nonlinear DT systems is reviewed first, and its convergence is rigorously proven. The bias of solution to Q-function-based Bellman equation caused by adding probing noises to systems for satisfying persistent excitation is also analyzed when using on-policy Q-learning approach. Then, a behavior control policy is introduced followed by proposing an off-policy Q-learning algorithm. Meanwhile, the convergence of algorithm and no bias of solution to optimal control problem when adding probing noise to systems are investigated. Third, three neural networks run by the interleaved Q-learning approach in the actor-critic framework. Thus, a novel off-policy interleaved Q-learning algorithm is derived, and its convergence is proven. Simulation results are given to verify the effectiveness of the proposed method.
引用
收藏
页码:1308 / 1320
页数:13
相关论文
共 37 条
[1]   Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].
Al-Tamimi, Asma ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
AUTOMATICA, 2007, 43 (03) :473-481
[2]  
Beard R. W., 1995, THESIS
[3]   Optimal operational control for complex industrial processes [J].
Chai, Tianyou ;
Qin, S. Joe ;
Wang, Hong .
ANNUAL REVIEWS IN CONTROL, 2014, 38 (01) :81-92
[4]   Hybrid intelligent control for optimal operation of shaft furnace roasting process [J].
Chai, Tianyou ;
Ding, Jinliang ;
Wu, Fenghua .
CONTROL ENGINEERING PRACTICE, 2011, 19 (03) :264-275
[5]   An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis [J].
Doltsinis, Stefanos ;
Ferreira, Pedro ;
Lohse, Niels .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2014, 44 (09) :1125-1138
[6]   Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming [J].
Fu, Jian ;
He, Haibo ;
Zhou, Xinmin .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (07) :1133-1148
[7]   Tracking Control for Linear Discrete-Time Networked Control Systems With Unknown Dynamics and Dropout [J].
Jiang, Yi ;
Fan, Jialu ;
Chai, Tianyou ;
Lewis, Frank L. ;
Li, Jinna .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) :4607-4620
[8]   Data-Driven Flotation Industrial Process Operational Optimal Control Based on Reinforcement Learning [J].
Jiang, Yi ;
Fan, Jialu ;
Chai, Tianyou ;
Li, Jinna ;
Lewis, Frank L. .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (05) :1974-1989
[9]   Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics [J].
Jiang, Yu ;
Jiang, Zhong-Ping .
AUTOMATICA, 2012, 48 (10) :2699-2704
[10]   Model-free H∞ control design for unknown linear discrete-time systems via Q-learning with LMI [J].
Kim, J. -H. ;
Lewis, F. L. .
AUTOMATICA, 2010, 46 (08) :1320-1326