Fuzzy-Based Adaptive Optimization of Unknown Discrete-Time Nonlinear Markov Jump Systems With Off-Policy Reinforcement Learning

被引:34
作者
Fang, Haiyang [1 ,2 ]
Tu, Yidong [1 ]
Wang, Hai [3 ,4 ]
He, Shuping [1 ,6 ]
Liu, Fei [5 ]
Ding, Zhengtao [6 ]
Cheng, Shing Shin [2 ]
机构
[1] Anhui Univ, Sch Elect Engn & Automat, Anhui Engn Lab Human Robot Integrat Syst & Intelli, Hefei, Peoples R China
[2] Chinese Univ Hong Kong, Dept Mech & Automat Engn, Hong Kong, Peoples R China
[3] Murdoch Univ, Discipline Engn & Energy, Murdoch, WA, Australia
[4] Murdoch Univ, Ctr Water Energy & Waste, Murdoch, WA, Australia
[5] Jiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi, Peoples R China
[6] Univ Manchester, Sch Elect & Elect Engn, Manchester, England
基金
中国国家自然科学基金;
关键词
Fuzzy coupled algebraic Riccati equations (FCAREs); off-policy iteration; reinforcement learning (RL); Takagi-Sugeno (T-S) fuzzy models; TRACKING CONTROL; DESIGN;
D O I
10.1109/TFUZZ.2022.3171844
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article explores a novel adaptive optimal control strategy for a class of sophisticated discrete-time nonlinear Markov jump systems (DTNMJSs) via Takagi-Sugeno fuzzy models and reinforcement learning (RL) techniques. First, the original nonlinear system model is represented by fuzzy approximation, while the relevant optimal control problem is equivalent to designing fuzzy controllers for linear fuzzy systems with Markov jumping parameters. Subsequently, we derive the fuzzy coupled algebraic Riccati equations for the fuzzy-based discrete-time linear Markov jump systems by using Hamiltonian-Bellman methods. Following this, an online fuzzy optimization algorithm for DTNMJSs as well as the associated equivalence proof is given. Then, a fully model-free off-policy fuzzy RL algorithm is derived with proved convergence for the DTNMJSs without using the information of system dynamics and transition probability. Finally, two simulation examples, respectively, related to the single-link robotic arm and the half-car active suspension are given to verify the effectiveness and good performance of the proposed approach.
引用
收藏
页码:5276 / 5290
页数:15
相关论文
共 34 条
[1]   Deep Reinforcement Learning A brief survey [J].
Arulkumaran, Kai ;
Deisenroth, Marc Peter ;
Brundage, Miles ;
Bharath, Anil Anthony .
IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38
[2]   Quadratic stability analysis and design of continuous-time fuzzy control systems [J].
Cao, SG ;
Rees, NW ;
Feng, G .
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 1996, 27 (02) :193-203
[3]   Adaptive Fuzzy Control of a Class of Nonlinear Systems by Fuzzy Approximation Approach [J].
Chen, Bing ;
Liu, Xiaoping P. ;
Ge, Shuzhi Sam ;
Lin, Chong .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2012, 20 (06) :1012-1021
[4]   Fuzzy Intermittent Extended Dissipative Control for Delayed Distributed Parameter Systems With Stochastic Disturbance: A Spatial Point Sampling Approach [J].
Ding, Kui ;
Zhu, Quanxin .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (06) :1734-1749
[5]   Quantized Control of Markov Jump Nonlinear Systems Based on Fuzzy Hidden Markov Model [J].
Dong, Shanling ;
Wu, Zheng-Guang ;
Shi, Peng ;
Su, Hongye ;
Huang, Tingwen .
IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (07) :2420-2430
[6]  
Driankov Dimiter., 2013, An introduction to fuzzy control
[7]   T-S Fuzzy Sampled-Data Control for Nonlinear Systems With Actuator Faults and Its Application to Wind Energy System [J].
Gandhi, Velmurugan ;
Joo, Young Hoon .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (02) :462-474
[8]   Performance-Based Fault Detection and Fault-Tolerant Control for Nonlinear Systems With T-S Fuzzy Implementation [J].
Han, Huayun ;
Yang, Ying ;
Li, Linlin ;
Ding, Steven X. .
IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (02) :801-814
[9]   Adaptive Optimal Control for a Class of Nonlinear Systems: The Online Policy Iteration Approach [J].
He, Shuping ;
Fang, Haiyang ;
Zhang, Maoguang ;
Liu, Fei ;
Ding, Zhengtao .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (02) :549-558
[10]   Reinforcement Learning and Feedback Control USING NATURAL DECISION METHODS TO DESIGN OPTIMAL ADAPTIVE CONTROLLERS [J].
Lewis, Frank L. ;
Vrabie, Draguna ;
Vamvoudakis, Kyriakos G. .
IEEE CONTROL SYSTEMS MAGAZINE, 2012, 32 (06) :76-105