Reinforcement learning-based optimal control for Markov jump systems with completely unknown dynamics

被引:6
|
作者
Shi, Xiongtao [1 ,2 ]
Li, Yanjie [1 ,2 ]
Du, Chenglong [3 ]
Chen, Chaoyang [4 ]
Zong, Guangdeng [5 ]
Gui, Weihua [3 ]
机构
[1] Harbin Inst Technol Shenzhen, Guangdong Key Lab Intelligent Morphing Mech & Adap, Shenzhen 518055, Peoples R China
[2] Harbin Inst Technol Shenzhen, Sch Mech Engn & Automat, Shenzhen 518055, Peoples R China
[3] Cent South Univ, Sch Automat, Changsha 410083, Peoples R China
[4] Hunan Univ Sci & Technol, Sch Informat & Elect Engn, Xiangtan 411201, Peoples R China
[5] Tiangong Univ, Sch Control Sci & Engn, Tianjin 300387, Peoples R China
关键词
Markov jump systems; Optimal control; Coupled algebraic Riccati equation; Parallel policy iteration; Reinforcement learning; ADAPTIVE OPTIMAL-CONTROL; TRACKING CONTROL; LINEAR-SYSTEMS;
D O I
10.1016/j.automatica.2024.111886
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, the optimal control problem of a class of unknown Markov jump systems (MJSs) is investigated via the parallel policy iteration-based reinforcement learning (PPI-RL) algorithms. First, by solving the linear parallel Lyapunov equation, a model-based PPI-RL algorithm is studied to learn the solution of nonlinear coupled algebraic Riccati equation (CARE) of MJSs with known dynamics, thereby updating the optimal control gain. Then, a novel partially model-free PPI-RL algorithm is proposed for the scenario that the dynamics of the MJS is partially unknown, in which the optimal solution of CARE is learned via the mixed input-output data of all modes. Furthermore, for the MJS with completely unknown dynamics, a completely model-free PPI-RL algorithm is developed to get the optimal control gain by removing the dependence of model information in the process of solving the optimal solution of CARE. It is proved that the proposed PPI-RL algorithms converge to the unique optimal solution of CARE for MJSs with known, partially unknown, and completely unknown dynamics, respectively. Finally, simulation results are illustrated to show the feasibility and effectiveness of the PPI-RL algorithms.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Reinforcement Learning-Based Optimal Tracking Control of an Unknown Unmanned Surface Vehicle
    Wang, Ning
    Gao, Ying
    Zhao, Hong
    Ahn, Choon Ki
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (07) : 3034 - 3045
  • [22] Reinforcement Learning-Based Control for a Class of Nonlinear Systems with unknown control directions
    Song, Xiaoling
    Huang, Miao
    Wen, Gang
    Ma, Longhua
    Yao, Jiaqing
    Lu, Zheming
    PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 2519 - 2524
  • [23] Reinforcement Learning-Based Optimal Stabilization for Unknown Nonlinear Systems Subject to Inputs With Uncertain Constraints
    Zhao, Bo
    Liu, Derong
    Luo, Chaomin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (10) : 4330 - 4340
  • [24] Reinforcement Learning-Based Composite Optimal Operational Control of Industrial Systems With Multiple Unit Devices
    Zhao, Jianguo
    Yang, Chunyu
    Dai, Wei
    Gao, Weinan
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (02) : 1091 - 1101
  • [25] Reinforcement learning-based adaptive optimal output feedback control for nonlinear systems with output quantization
    Jin, Yitong
    Wang, Fang
    Lai, Guanyu
    Zhang, Xueyi
    NONLINEAR DYNAMICS, 2024, : 7029 - 7045
  • [26] Reinforcement learning-based robust optimal tracking control for disturbed nonlinear systems
    Fan, Zhong-Xin
    Tang, Lintao
    Li, Shihua
    Liu, Rongjie
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (33) : 23987 - 23996
  • [27] Learning-Based Pareto Optimal Control of Large-Scale Systems With Unknown Slow Dynamics
    Hesarkuchak, Saeed Tajik
    Boker, Almuatazbellah
    Reddy, Vasanth
    Mili, Lamine
    Eldardiry, Hoda
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 838 - 843
  • [28] Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics
    Lv, Yongfeng
    Na, Jing
    Yang, Qinmin
    Wu, Xing
    Guo, Yu
    INTERNATIONAL JOURNAL OF CONTROL, 2016, 89 (01) : 99 - 112
  • [29] Optimal Lateral Path-Tracking Control of Vehicles With Partial Unknown Dynamics via DPG-Based Reinforcement Learning Methods
    Shi, Xiongtao
    Li, Yanjie
    Hu, Wenxiao
    Du, Chenglong
    Chen, Chaoyang
    Gui, Weihua
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1701 - 1710
  • [30] A reinforcement learning-based scheme for direct adaptive optimal control of linear stochastic systems
    Wong, Wee Chin
    Lee, Jay H.
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2010, 31 (04) : 365 - 374