Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems

被引：109

作者：

Li, Jinna ^{[1
,2
]}

Chai, Tianyou ^{[1
,3
]}

Lewis, Frank L. ^{[1
,3
,4
]}

Ding, Zhengtao ^{[5
]}

Jiang, Yi ^{[1
,3
]}

机构：

[1] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Liaoning, Peoples R China

[2] Liaoning Shihua Univ, Sch Informat & Control Engn, Fushun 113001, Peoples R China

[3] Northeastern Univ, Int Joint Res Lab Integrated Automat, Shenyang 110819, Liaoning, Peoples R China

[4] Univ Texas Arlington, UTA Res Inst, Arlington, TX 76118 USA

[5] Univ Manchester, Sch Elect & Elect Engn, Manchester M13 9PL, Lancs, England

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2019年 / 30卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Affine nonlinear systems; interleaved learning; off-policy learning; optimal control; Q-learning; H-INFINITY CONTROL; OPTIMAL OPERATIONAL CONTROL; ADAPTIVE OPTIMAL-CONTROL; LINEAR-SYSTEMS; DESIGN; ITERATION; GAMES;

D O I：

10.1109/TNNLS.2018.2861945

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, a novel off-policy interleaved Q-learning algorithm is presented for solving optimal control problem of affine nonlinear discrete-time (DT) systems, using only the measured data along the system trajectories. Affine nonlinear feature of systems, unknown dynamics, and off-policy learning approach pose tremendous challenges on approximating optimal controllers. To this end, on-policy Q-learning method for optimal control of affine nonlinear DT systems is reviewed first, and its convergence is rigorously proven. The bias of solution to Q-function-based Bellman equation caused by adding probing noises to systems for satisfying persistent excitation is also analyzed when using on-policy Q-learning approach. Then, a behavior control policy is introduced followed by proposing an off-policy Q-learning algorithm. Meanwhile, the convergence of algorithm and no bias of solution to optimal control problem when adding probing noise to systems are investigated. Third, three neural networks run by the interleaved Q-learning approach in the actor-critic framework. Thus, a novel off-policy interleaved Q-learning algorithm is derived, and its convergence is proven. Simulation results are given to verify the effectiveness of the proposed method.

引用

页码：1308 / 1320

页数：13

共 37 条

[1] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[2]

Beard R. W., 1995, THESIS

[3] Optimal operational control for complex industrial processes [J].

Chai, Tianyou ;

Qin, S. Joe ;

Wang, Hong .

ANNUAL REVIEWS IN CONTROL, 2014, 38 (01) :81-92

[4] Hybrid intelligent control for optimal operation of shaft furnace roasting process [J].

Chai, Tianyou ;

Ding, Jinliang ;

Wu, Fenghua .

CONTROL ENGINEERING PRACTICE, 2011, 19 (03) :264-275

[5] An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis [J].

Doltsinis, Stefanos ;

Ferreira, Pedro ;

Lohse, Niels .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2014, 44 (09) :1125-1138

[6] Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming [J].

Fu, Jian ;

He, Haibo ;

Zhou, Xinmin .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (07) :1133-1148

[7] Tracking Control for Linear Discrete-Time Networked Control Systems With Unknown Dynamics and Dropout [J].

Jiang, Yi ;

Fan, Jialu ;

Chai, Tianyou ;

Lewis, Frank L. ;

Li, Jinna .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) :4607-4620

[8] Data-Driven Flotation Industrial Process Operational Optimal Control Based on Reinforcement Learning [J].

Jiang, Yi ;

Fan, Jialu ;

Chai, Tianyou ;

Li, Jinna ;

Lewis, Frank L. .

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (05) :1974-1989

[9] Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics [J].

Jiang, Yu ;

Jiang, Zhong-Ping .

AUTOMATICA, 2012, 48 (10) :2699-2704

[10] Model-free H∞ control design for unknown linear discrete-time systems via Q-learning with LMI [J].

Kim, J. -H. ;

Lewis, F. L. .

AUTOMATICA, 2010, 46 (08) :1320-1326

← 1 2 3 4 →