Off-policy two-dimensional reinforcement learning for optimal tracking control of batch processes with network-induced dropout and disturbances

被引:2
|
作者
Jiang, Xueying [1 ,2 ]
Huang, Min [1 ,2 ]
Shi, Huiyuan [2 ,3 ]
Wang, Xingwei [4 ]
Zhang, Yanfeng [4 ]
机构
[1] Northeastern Univ, Coll Informat Sci & Engn, Qinhuangdao, Peoples R China
[2] Northeastern Univ, State Key Lab Synthet Automa Proc Ind, Qinhuangdao, Peoples R China
[3] Liaoning Petrochem Univ, Sch Informat & Control Engn, Fushun, Peoples R China
[4] Northeastern Univ, Coll Comp Sci & Engn, Fushun, Peoples R China
关键词
Two-dimensional (2D); Reinforcement learning; Optimal tracking control; Batch processes; Network -induced dropout; Injection velocity; TIME-SYSTEMS; GAMES;
D O I
10.1016/j.isatra.2023.11.011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, a new off-policy two-dimensional (2D) reinforcement learning approach is proposed to deal with the optimal tracking control (OTC) issue of batch processes with network-induced dropout and disturbances. A dropout 2D augmented Smith predictor is first devised to estimate the present extended state utilizing past data of time and batch orientations. The dropout 2D value function and Q-function are further defined, and their relation is analyzed to meet the optimal performance. On this basis, the dropout 2D Bellman equation is derived according to the principle of the Q-function. For the sake of addressing the dropout 2D OTC problem of batch processes, two algorithms, i.e., the off-line 2D policy iteration algorithm and the off-policy 2D Q-learning algorithm, are presented. The latter method is developed by applying only the input and the estimated state, not the underlying information of the system. Meanwhile, the analysis with regard to the unbiasedness of solutions and convergence is separately given. The effectiveness of the provided methodologies is eventually validated through the application of a simulated case during the filling process.
引用
收藏
页码:228 / 244
页数:17
相关论文
共 34 条
  • [1] Optimal tracking control of nonlinear batch processes with unknown dynamics using two-dimensional off-policy interleaved Q-learning algorithm
    Shi, Huiyuan
    Gao, Wei
    Jiang, Xueying
    Su, Chengli
    Li, Ping
    INTERNATIONAL JOURNAL OF CONTROL, 2024, 97 (10) : 2329 - 2341
  • [2] Optimal robust online tracking control for space manipulator in task space using off-policy reinforcement learning
    Zhuang, Hongji
    Zhou, Hang
    Shen, Qiang
    Wu, Shufan
    Razoumny, Vladimir Yu.
    Razoumny, Yury N.
    AEROSPACE SCIENCE AND TECHNOLOGY, 2024, 153
  • [3] Off-policy reinforcement learning algorithm for robust optimal control of uncertain nonlinear systems
    Amirparast, Ali
    Kamal Hosseini Sani, S.
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (08) : 5419 - 5437
  • [4] Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems
    Wei Qing-Lai
    Song Rui-Zhuo
    Sun Qiu-Ye
    Xiao Wen-Dong
    CHINESE PHYSICS B, 2015, 24 (09)
  • [5] Off-policy integral reinforcement learning optimal tracking control for continuous-time chaotic systems
    魏庆来
    宋睿卓
    孙秋野
    肖文栋
    Chinese Physics B, 2015, 24 (09) : 151 - 156
  • [6] Hyperparameter Tuning of an Off-Policy Reinforcement Learning Algorithm for H∞ Tracking Control
    Farahmandi, Alireza
    Reitz, Brian
    Debord, Mark
    Philbrick, Douglas
    Estabridis, Katia
    Hewer, Gary
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [7] Optimal Control of Iron-Removal Systems Based on Off-Policy Reinforcement Learning
    Chen, Ning
    Luo, Shuhan
    Dai, Jiayang
    Luo, Biao
    Gui, Weihua
    IEEE ACCESS, 2020, 8 (08): : 149730 - 149740
  • [8] Optimal Control for Multi-agent Systems Using Off-Policy Reinforcement Learning
    Wang, Hao
    Chen, Zhiru
    Wang, Jun
    Lu, Lijun
    Li, Mingzhe
    2022 4TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS, ICCR, 2022, : 135 - 140
  • [9] Optimal tracking control of batch processes with time-invariant state delay: Adaptive Q-learning with two-dimensional state and control policy
    Shi, Huiyuan
    Lv, Mengdi
    Jiang, Xueying
    Su, Chengli
    Li, Ping
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132
  • [10] Two-Dimensional Model-Free Optimal Tracking Control for Batch Processes With Packet Loss
    Shi, Huiyuan
    Wen, Xin
    Jiang, Xueying
    Su, Chengli
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (02): : 1032 - 1045