Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information

被引:59
作者
He, Shuping [1 ,2 ]
Zhang, Maoguang [1 ]
Fang, Haiyang [1 ]
Liu, Fei [3 ]
Luan, Xiaoli [3 ]
Ding, Zhengtao [4 ]
机构
[1] Anhui Univ, Sch Elect Engn & Automat, Hefei 230601, Peoples R China
[2] Anhui Univ, Inst Phys Sci & Informat Technol, Hefei 230601, Peoples R China
[3] Jiangnan Univ, Inst Automat, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Jiangsu, Peoples R China
[4] Univ Manchester, Sch Elect & Elect Engn, Manchester M13 9PL, Lancs, England
基金
中国国家自然科学基金;
关键词
Markov jump linear systems (MJLSs); Adaptive optimal control; Online; Reinforcement learning (RL); Coupled algebraic Riccati equations (AREs); DISCRETE-TIME-SYSTEMS; SLIDING MODE CONTROL; DESIGN; ALGORITHM;
D O I
10.1007/s00521-019-04180-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, an online adaptive optimal control problem of a class of continuous-time Markov jump linear systems (MJLSs) is investigated by using a parallel reinforcement learning (RL) algorithm with completely unknown dynamics. Before collecting and learning the subsystems information of states and inputs, the exploration noise is firstly added to describe the actual control input. Then, a novel parallel RL algorithm is used to parallelly compute the correspondingNcoupled algebraic Riccati equations by online learning. By this algorithm, we will not need to know the dynamic information of the MJLSs. The convergence of the proposed algorithm is also proved. Finally, the effectiveness and applicability of this novel algorithm is illustrated by two simulation examples.
引用
收藏
页码:14311 / 14320
页数:10
相关论文
共 47 条
[1]   A hybrid algorithm using a genetic algorithm and multiagent reinforcement learning heuristic to solve the traveling salesman problem [J].
Alipour, Mir Mohammad ;
Razavi, Seyed Naser ;
Derakhshi, Mohammad Reza Feizi ;
Balafar, Mohammad Ali .
NEURAL COMPUTING & APPLICATIONS, 2018, 30 (09) :2935-2951
[2]  
[Anonymous], IEEE T AUTOM CONTROL
[3]  
[Anonymous], IEEE T SYST MAN CYBE
[4]   A Flexible Terminal Approach to Sampled-Data Exponentially Synchronization of Markovian Neural Networks With Time-Varying Delayed Signals [J].
Cheng, Jun ;
Park, Ju H. ;
Karimi, Hamid Reza ;
Shen, Hao .
IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (08) :2232-2244
[5]  
Costa O. L. V., 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251), P1791, DOI 10.1109/ACC.1999.786152
[6]   General transformation for block diagonalization of weakly coupled linear systems composed of N-subsystems [J].
Gajic, Z ;
Borno, I .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-FUNDAMENTAL THEORY AND APPLICATIONS, 2000, 47 (06) :909-912
[7]  
Gajic Z, 1975, IEEE T AUTOMAT CONTR, V40, P1971
[8]  
Gang Tao, 2003, Adaptive Control Design and Analysis, V37
[9]   Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems [J].
Guo, Wentao ;
Si, Jennie ;
Liu, Feng ;
Mei, Shengwei .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (07) :2794-2807
[10]   Online adaptive optimal control for continuous-time Markov jump linear systems using a novel policy iteration algorithm [J].
He, Shuping ;
Song, Jun ;
Ding, Zhengtao ;
Liu, Fei .
IET CONTROL THEORY AND APPLICATIONS, 2015, 9 (10) :1536-1543