Unconstrained feedback controller design using Q-learning from noisy data

被引：0

作者：

Kumar, Pratyush ^{[1
]}

Rawlings, James B. ^{[1
]}

机构：

[1] Univ Calif Santa Barbara, Dept Chem Engn, Santa Barbara, CA 93106 USA

来源：

COMPUTERS & CHEMICAL ENGINEERING | 2023年 / 177卷

关键词：

Reinforcement learning; Q-learning; Least squares policy iteration; System identification; Maximum likelihood estimation; Linear quadratic regulator; MODEL-PREDICTIVE CONTROL; REINFORCEMENT; STABILITY; MPC;

D O I：

10.1016/j.compchemeng.2023.108325

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper develops a novel model-free Q-learning based approach to estimate linear, unconstrained feedback controllers from noisy process data. The proposed method is based on an extension of an available approach developed to estimate the linear quadratic regulator (LQR) for linear systems with full state measurements driven by Gaussian process noise of known covariance. First, we modify the approach to treat the case of an unknown noise covariance. Then, we use the modified approach to estimate a feedback controller for linear systems with both process and measurement noise and only output measurements. We also present a model-based maximum likelihood estimation (MLE) approach to determine a linear dynamic model and noise covariances from data, which is used to construct a regulator and state estimator for comparisons in simulation studies. The performances of the model-free and model-based controller estimation approaches are compared with an example heating, ventilation, and air-conditioning (HVAC) system. We show that the proposed Q-learning approach estimates a reasonably accurate feedback controller from 24 h of noisy data. The controllers estimated using both the model-free and model-based approaches provide similar closed-loop performances with 3.5 and 2.7% losses respectively, compared to a perfect controller that uses the true dynamic model and noise covariances of the HVAC system. Finally, we give future work directions for the model-free controller design approaches by discussing some remaining advantages of the model-based approaches.

引用

页数：13

共 50 条

[1] Design of a fuzzy logic controller with Evolutionary Q-Learning
Kim, Min-Soeng
Lee, Ju-Jang
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2006, 12 (04) : 369 - 381
[2] Data-Driven Optimal Controller Design for Maglev Train: Q-Learning Method
Xin, Liang
Jiang, Hongwei
Wen, Tao
Long, Zhiqiang
2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 1289 - 1294
[3] Reactive fuzzy controller design by Q-learning for mobile robot navigation
张文志
吕恬生
Journal of Harbin Institute of Technology, 2005, (03) : 319 - 324
[4] Q-LEARNING WITH CENSORED DATA
Goldberg, Yair
Kosorok, Michael R.
ANNALS OF STATISTICS, 2012, 40 (01) : 529 - 560
[5] Model-free Resilient Controller Design based on Incentive Feedback Stackelberg Game and Q-learning
Shen, Jiajun
Li, Fengjun
Hashemi, Morteza
Fang, Huazhen
2024 IEEE 7TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS 2024, 2024,
[6] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
Ghazanfari, Behzad
Mozayani, Nasser
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
[7] Inverse Q-Learning Using Input-Output Data
Lian, Bosen
Xue, Wenqian
Lewis, Frank L.
Davoudi, Ali
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (02) : 728 - 738
[8] Design of Dynamic Fuzzy Q-Learning Controller for Networked Wind Energy Conversion Systems
Wanigasekara, Chathura
Swain, Akshya
Almakhles, Dhafer
Zhou, Lv
2020 20TH IEEE INTERNATIONAL CONFERENCE ON ENVIRONMENT AND ELECTRICAL ENGINEERING AND 2020 4TH IEEE INDUSTRIAL AND COMMERCIAL POWER SYSTEMS EUROPE (EEEIC/I&CPS EUROPE), 2020,
[9] Q-learning approach to asymptotic feedback set stabilization with missing data in networked systems
Li, Yan
Yang, Ziyi
Huang, Chi
Xiong, Wenjun
ENGINEERING ANALYSIS WITH BOUNDARY ELEMENTS, 2025, 177
[10] Model based path planning using Q-Learning
Sharma, Avinash
Gupta, Kanika
Kumar, Anirudha
Sharma, Aishwarya
Kumar, Rajesh
2017 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2017, : 837 - 842

← 1 2 3 4 5 →