Novel data-driven two-dimensional Q-learning for optimal tracking control of batch process with unknown dynamics

被引:18
|
作者
Wen, Xin [1 ]
Shi, Huiyuan [1 ,2 ,3 ]
Su, Chengli [1 ,4 ,7 ]
Jiang, Xueying [5 ]
Li, Ping [1 ,4 ]
Yu, Jingxian [6 ]
机构
[1] Liaoning Petrochem Univ, Sch Informat & Control Engn, Fushun, Peoples R China
[2] Northwestern Polytech Univ, Sch Automat, Xian, Peoples R China
[3] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang, Peoples R China
[4] Univ Sci & Technol Liaoning, Sch Elect & Informat Engn, Anshan, Peoples R China
[5] Northeastern Univ, Sch Informat Sci & Engn, Shenyang, Peoples R China
[6] Liaoning Petrochem Univ, Sch Sci, Fushun, Peoples R China
[7] Liaoning Petrochem Univ, Sch Informat & Control Engn, Fushun 113001, Peoples R China
基金
中国国家自然科学基金;
关键词
Batchprocess; Data-driven; 2Doff-policyQ-learning; Optimaltrackingcontrol; Injectionmolding; MODEL PREDICTIVE CONTROL; FAULT-TOLERANT CONTROL; STATE DELAY; DESIGN; FEEDBACK;
D O I
10.1016/j.isatra.2021.06.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In view that the previous control methods usually rely too much on the models of batch process and have difficulty in a practical batch process with unknown dynamics, a novel data-driven twodimensional (2D) off-policy Q-learning approach for optimal tracking control (OTC) is proposed to make the batch process obtain a model-free control law. Firstly, an extended state space equation composing of the state and output error is established for ensuring tracking performance of the designed controller. Secondly, the behavior policy of generating data and the target policy of optimization as well as learning is introduced based on this extended system. Then, the Bellman equation independent of model parameters is given via analyzing the relation between 2D value function and 2D Q-function. The measured data along the batch and time directions of batch process are just taken to carry out the policy iteration, which can figure out the optimal control problem despite lacking systematic dynamic information. The unbiasedness and convergence of the designed 2D off-policy Q-learning algorithm are proved. Finally, a simulation case for injection molding process manifests that control effect and tracking effect gradually become better with the increasing number of batches.(c) 2021 ISA. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:10 / 21
页数:12
相关论文
共 50 条
  • [31] Computationally Efficient Data-Driven Higher Order Optimal Iterative Learning Control
    Chi, Ronghu
    Hou, Zhongsheng
    Jin, Shangtai
    Huang, Biao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) : 5971 - 5980
  • [32] Data-Driven $H_{∞}$ Optimal Output Feedback Control for Linear Discrete-Time Systems Based on Off-Policy Q-Learning
    Zhang, Li
    Fan, Jialu
    Xue, Wenqian
    Lopez, Victor G.
    Li, Jinna
    Chai, Tianyou
    Lewis, Frank L.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (07) : 3553 - 3567
  • [33] Off-policy two-dimensional reinforcement learning for optimal tracking control of batch processes with network-induced dropout and disturbances
    Jiang, Xueying
    Huang, Min
    Shi, Huiyuan
    Wang, Xingwei
    Zhang, Yanfeng
    ISA TRANSACTIONS, 2024, 144 : 228 - 244
  • [34] Data-Driven Adaptive Dynamic Programming for Optimal Control of Continuous-Time Multicontroller Systems With Unknown Dynamics
    Zhao, Jingang
    IEEE ACCESS, 2022, 10 : 41503 - 41511
  • [35] FINITE-HORIZON OPTIMAL CONTROL OF DISCRETE-TIME LINEAR SYSTEMS WITH COMPLETELY UNKNOWN DYNAMICS USING Q-LEARNING
    Zhao, Jingang
    Zhang, Chi
    JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2021, 17 (03) : 1471 - 1483
  • [36] Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method
    Jiang, He
    Zhang, Huaguang
    Cui, Yang
    Xiao, Geyang
    NEUROCOMPUTING, 2018, 273 : 68 - 77
  • [37] The Adaptive Optimal Output Feedback Tracking Control of Unknown Discrete-Time Linear Systems Using a Multistep Q-Learning Approach
    Dong, Xunde
    Lin, Yuxin
    Suo, Xudong
    Wang, Xihao
    Sun, Weijie
    MATHEMATICS, 2024, 12 (04)
  • [38] Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method
    Zhang, Huaguang
    Jiang, He
    Luo, Yanhong
    Xiao, Geyang
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2017, 64 (05) : 4091 - 4100
  • [39] Reinforcement Q-Learning Algorithm for H∞ Tracking Control of Unknown Discrete-Time Linear Systems
    Peng, Yunjian
    Chen, Qian
    Sun, Weijie
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11): : 4109 - 4122
  • [40] Q-learning based tracking control with novel finite-horizon performance index ☆
    Wang, Wei
    Wang, Ke
    Huang, Zixin
    Mu, Chaoxu
    Shi, Haoxian
    INFORMATION SCIENCES, 2024, 681