Solving the Zero-Sum Control Problem for Tidal Turbine System: An Online Reinforcement Learning Approach

被引：29

作者：

Fang, Haiyang ^{[1
,2
]}

Zhang, Maoguang ^{[1
]}

He, Shuping ^{[1
]}

Luan, Xiaoli ^{[3
]}

Liu, Fei ^{[3
]}

Ding, Zhengtao ^{[4
]}

机构：

[1] Anhui Univ, Sch Elect Engn & Automat, Anhui Engn Lab Human Robot Integrat Syst & Intell, Hefei 230601, Peoples R China

[2] Chinese Univ Hong Kong, Dept Mech & Automat Engn, Hong Kong, Peoples R China

[3] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China

[4] Univ Manchester, Dept Elect & Elect Engn, Manchester M13 9PL, Lancs, England

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2023年 / 53卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Turbines; Reinforcement learning; Markov processes; Rotors; Optimal control; Games; Mathematical models; Game-coupled algebraic Riccati equations; integral reinforcement learning; Markov jump linear systems (MJLSs); tidal turbine; zero-sum games; ADAPTIVE OPTIMAL-CONTROL; TIME NONLINEAR-SYSTEMS; JUMP LINEAR-SYSTEMS; LYAPUNOV ITERATIONS; GAMES; PITCH; ALGORITHM; ROBUST;

D O I：

10.1109/TCYB.2022.3186886

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A novel completely mode-free integral reinforcement learning (CMFIRL)-based iteration algorithm is proposed in this article to compute the two-player zero-sum games and the Nash equilibrium problems, that is, the optimal control policy pairs, for tidal turbine system based on continuous-time Markov jump linear model with exact transition probability and completely unknown dynamics. First, the tidal turbine system is modeled into Markov jump linear systems, followed by a designed subsystem transformation technique to decouple the jumping modes. Then, a completely mode-free reinforcement learning algorithm is employed to address the game-coupled algebraic Riccati equations without using the information of the system dynamics, in order to reach the Nash equilibrium. The learning algorithm includes one iteration loop by updating the control policy and the disturbance policy simultaneously. Also, the exploration signal is added for motivating the system, and the convergence of the CMFIRL iteration algorithm is rigorously proved. Finally, a simulation example is given to illustrate the effectiveness and applicability of the control design approach.

引用

页码：7635 / 7647

页数：13

共 65 条

[1]

Aliprantis Ch.D., 2006, Positive operators, DOI DOI 10.1007/978-1-4020-5008-4

[2]

Astrom K., 1989, Adaptive Control

[3]

Breton Andre., 1960, NADJA

[4]

Fu HT, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2329

[5] Online Solution of Two-Player Zero-Sum Games for Continuous-Time Nonlinear Systems With Completely Unknown Dynamics [J].

Fu, Yue ;

Chai, Tianyou .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (12) :2577-2587

[6] Monotonicity of algebraic Lyapunov iterations for optimal control of jump parameter linear systems [J].

Gajic, Z ;

Losada, R .

SYSTEMS & CONTROL LETTERS, 2000, 41 (03) :175-181

[7] LYAPUNOV ITERATIONS FOR OPTIMAL-CONTROL OF JUMP LINEAR-SYSTEMS AT STEADY-STATE [J].

GAJIC, Z ;

BORNO, I .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1995, 40 (11) :1971-1975

[8] Receding Horizon Pseudospectral Control for Energy Maximization With Application to Wave Energy Devices [J].

Genest, Romain ;

Ringwood, John V. .

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2017, 25 (01) :29-38

[9] Optimal H2 state feedback sampled-data control design of Markov Jump Linear Systems [J].

Geromel, Jose C. ;

Gabriel, Gabriela W. .

AUTOMATICA, 2015, 54 :182-188

[10]

Ghefiri K, 2017, INT C CONTROL DECISI, P1003, DOI 10.1109/CoDIT.2017.8102730

← 1 2 3 4 5 6 7 →