Linear Quadratic Control Using Model-Free Reinforcement Learning

被引：35

作者：

Yaghmaie, Farnaz Adib ^{[1
]}

Gustafsson, Fredrik ^{[1
]}

Ljung, Lennart ^{[1
]}

机构：

[1] Linkoping Univ, Dept Elect Engn, S-58431 Linkoping, Sweden

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2023年 / 68卷 / 02期

基金：

瑞典研究理事会;

关键词：

Noise measurement; Costs; Dynamical systems; Adaptation models; Heuristic algorithms; Process control; Optimal control; Linear quadratic (LQ) control; reinforcement learning (RL); SYSTEMS;

D O I：

10.1109/TAC.2022.3145632

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we consider linear quadratic (LQ) control problem with process and measurement noises. We analyze the LQ problem in terms of the average cost and the structure of the value function. We assume that the dynamics of the linear system is unknown and only noisy measurements of the state variable are available. Using noisy measurements of the state variable, we propose two model-free iterative algorithms to solve the LQ problem. The proposed algorithms are variants of policy iteration routine where the policy is greedy with respect to the average of all previous iterations. We rigorously analyze the properties of the proposed algorithms, including stability of the generated controllers and convergence. We analyze the effect of measurement noise on the performance of the proposed algorithms, the classical off-policy, and the classical Q-learning routines. We also investigate a model-building approach, inspired by adaptive control, where a model of the dynamical system is estimated and the optimal control problem is solved assuming that the estimated model is the true model. We use a benchmark to evaluate and compare our proposed algorithms with the classical off-policy, the classical Q-learning, and the policy gradient. We show that our model-building approach performs nearly identical to the analytical solution and our proposed policy iteration-based algorithms outperform the classical off-policy and the classical Q-learning algorithms on this benchmark but do not outperform the model-building approach.

引用

页码：737 / 752

页数：16

共 50 条

[41] MFRL-BI: Design of a model-free reinforcement learning process control scheme by using Bayesian inference [J].

Li, Yanrong ;

Du, Juan ;

Jiang, Wei ;

Tsung, Fugee .

IISE TRANSACTIONS, 2025, 57 (09) :1056-1070

[42] Model-free Optimal Coordinated Control for Rigidly Connected Dual-motor Systems Using Reinforcement Learning [J].

Yang C. ;

Wang H. ;

Zhao J. .

Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2024, 44 (09) :3691-3701

[43] Model-Free Distributed Reinforcement Learning State Estimation of a Dynamical System Using Integral Value Functions [J].

Salamat, Babak ;

Elsbacher, Gerhard ;

Tonello, Andrea M. ;

Belzner, Lenz .

IEEE OPEN JOURNAL OF CONTROL SYSTEMS, 2023, 2 :70-78

[44] Designing a Model-Free Reinforcement Learning Controller for a Flexible-Link Manipulator [J].

Raoufi, Mona ;

Delavari, Hadi .

2021 9TH RSI INTERNATIONAL CONFERENCE ON ROBOTICS AND MECHATRONICS (ICROM), 2021, :1-6

[45] Optimal model-free adaptive control based on reinforcement Q-Learning for solar thermal collector fields [J].

Pataro, Igor M. L. ;

Cunha, Rita ;

Gil, Juan D. ;

Guzman, Jose L. ;

Berenguel, Manuel ;

Lemos, Joao M. .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126

[46] Shared Linear Quadratic Regulation Control: A Reinforcement Learning Approach [J].

Abu-Khalaf, Murad ;

Karaman, Sertac ;

Rus, Daniela .

2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, :4569-4576

[47] Plume Tracing via Model-Free Reinforcement Learning Method [J].

Hu, Hangkai ;

Song, Shiji ;

Chen, C. L. Phillip .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (08) :2515-2527

[48] Model-Free Solution for Inverse Linear-Quadratic Nonzero-Sum Differential Games [J].

Martirosyan, Emin ;

Cao, Ming .

IEEE CONTROL SYSTEMS LETTERS, 2024, 8 :2445-2450

[49] On Robust Model-Free Reduced-Dimensional Reinforcement Learning Control for Singularly Perturbed Systems [J].

Mukherjee, Sayak ;

Bai, He ;

Chakrabortty, Aranya .

2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, :3914-3919

[50] An Improved Reinforcement Learning Based Heuristic Dynamic Programming Algorithm for Model-Free Optimal Control [J].

Li, Jia ;

Yuan, Zhaolin ;

Ban, Xiaojuan .

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 :282-294

← 1 2 3 4 5 →