Imitation-Based Reinforcement Learning for Markov Jump Systems and Its Application

被引：3

作者：

Wu, Jiacheng ^{[1
,2
]}

Wang, Jing ^{[3
,4
]}

Shen, Hao ^{[3
,4
]}

Park, Ju H. ^{[5
]}

机构：

[1] Zhejiang Univ, Inst Cyber Syst & Control, State Key Lab Ind Control Technol, Hangzhou 310027, Peoples R China

[2] Anhui Univ Technol, Sch Elect & Informat Engn, Maanshan 243002, Peoples R China

[3] Anhui Univ Technol, Anhui Prov Key Lab Power Elect & Mot Control, Maanshan 243002, Peoples R China

[4] Anhui Univ Technol, Sch Elect & Informat Engn, Maanshan 243002, Peoples R China

[5] Yeungnam Univ, Dept Elect Engn, Gyongsan 38541, South Korea

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2024年 / 71卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Markov jump systems; optimal control; zero-sum game; reinforcement learning; imitation learning;

D O I：

10.1109/TCSI.2024.3387914

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, the imitation reinforcement learning-based control problem is studied for discrete-time Markov jump systems with external disturbances. First, zero-sum game method is introduced to deal with external disturbances, where control input and external disturbances are regarded as two rival players in adversarial environments. Then, the imitation reinforcement learning problem is formulated, where learner Markov jump systems aim to learn the optimal behavior of expert Markov jump systems. Considering that the dynamics information of both learner systems and expert systems is accurately known, an offline parallel imitation learning algorithm is designed for learner systems to mimic expert behaviors, which contains three steps: 1) policy evaluation, 2) search for weight matrix, and 3) policy improvement. On this basis, by observing the optimal behavior of expert systems, an online imitation reinforcement learning algorithm is presented for learner systems with completely unknown system dynamics. Moreover, rigorous proofs of convergence and stability analysis are provided to guarantee the performance of proposed algorithms. Finally, the effectiveness of the proposed method is verified by a single-machine infinite-bus power systems.

引用

页码：3810 / 3819

页数：10

共 34 条

[1] Autonomous Helicopter Aerobatics through Apprenticeship Learning [J].

Abbeel, Pieter ;

Coates, Adam ;

Ng, Andrew Y. .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2010, 29 (13) :1608-1639

[2] Impact of Virtual Synchronous Machines on Low-Frequency Oscillations in Power Systems [J].

Baruwa, Muftau ;

Fazeli, Meghdad .

IEEE TRANSACTIONS ON POWER SYSTEMS, 2021, 36 (03) :1934-1946

[3] Finite-time optimal control for Markov jump systems with singular perturbation and hard constraints [J].

Cheng, Jun ;

Xu, Jiangming ;

Zhang, Dan ;

Yan, Huaicheng ;

Wang, Hailing .

INFORMATION SCIENCES, 2023, 632 :454-466

[4] Hierarchical Bayesian Inverse Reinforcement Learning [J].

Choi, Jaedeug ;

Kim, Kee-Eung .

IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (04) :793-805

[5] Inverse Reinforcement Learning Control for Trajectory Tracking of a Multirotor UAV [J].

Choi, Seungwon ;

Kim, Suseong ;

Kim, H. Jin .

INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2017, 15 (04) :1826-1834

[6]

Costa M. D., Discrete-Time MarkovJump Linear Systems

[7] Fuzzy-Based Adaptive Optimization of Unknown Discrete-Time Nonlinear Markov Jump Systems With Off-Policy Reinforcement Learning [J].

Fang, Haiyang ;

Tu, Yidong ;

Wang, Hai ;

He, Shuping ;

Liu, Fei ;

Ding, Zhengtao ;

Cheng, Shing Shin .

IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (12) :5276-5290

[8] Solving the Zero-Sum Control Problem for Tidal Turbine System: An Online Reinforcement Learning Approach [J].

Fang, Haiyang ;

Zhang, Maoguang ;

He, Shuping ;

Luan, Xiaoli ;

Liu, Fei ;

Ding, Zhengtao .

IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (12) :7635-7647

[9] Online adaptive optimal control for continuous-time Markov jump linear systems using a novel policy iteration algorithm [J].

He, Shuping ;

Song, Jun ;

Ding, Zhengtao ;

Liu, Fei .

IET CONTROL THEORY AND APPLICATIONS, 2015, 9 (10) :1536-1543

[10] Asynchronous Control for Discrete-Time Hidden Markov Jump Power Systems [J].

Kuppusamy, Subramanian ;

Joo, Young Hoon ;

Kim, Han Sol .

IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) :9943-9948

← 1 2 3 4 →