A Q-learning predictive control scheme with guaranteed stability

被引：9

作者：

Beckenbach, Lukas ^{[1
]}

Osinenko, Pavel ^{[1
]}

Streif, Stefan ^{[1
]}

机构：

[1] Tech Univ Chemnitz, Automat Control & Syst Dynam Lab, D-09107 Chemnitz, Germany

来源：

EUROPEAN JOURNAL OF CONTROL | 2020年 / 56卷 / 56期

关键词：

Predictive control; Q-Learning; Cost shaping; Nominal stability; RECEDING-HORIZON CONTROL; DISCRETE-TIME-SYSTEMS; NONLINEAR-SYSTEMS; FINITE; PERFORMANCE; MPC; STATE;

D O I：

10.1016/j.ejcon.2020.03.001

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Model-based predictive controllers are used to tackle control tasks in which constraints on state, input or both need to be satisfied. These controllers commonly optimize a fixed finite-horizon cost, which relates to an infinite-horizon (IH) cost profile, while the resulting closed-loop under the predictive controller yields an in general suboptimal IH cost. To capture the optimal IH cost and the associated control policy, reinforcement learning methods, such as Q-learning, that approximate said cost via a parametric architec-ture can be employed. Conversely to predictive controllers, however, closed-loop stability has rarely been investigated under the approximation associated controller in explicit dependence of these parameters. It is the aim of this work to incorporate model-based Q-learning into a predictive control setup as to provide closed-loop stability in online learning, while eventually improving the performance of finite-horizon controllers. The proposed scheme provides nominal asymptotic stability and the observation was made that the suggested learning approach could in fact improve the performance against a baseline predictive controller. (c) 2020 European Control Association. Published by Elsevier Ltd. All rights reserved.

引用

页码：167 / 178

页数：12

共 50 条

[31] Ramp Metering Control Based on the Q-Learning Algorithm [J].

Ivanjko, Edouard ;

Necoska, Daniela Koltovska ;

Greguric, Martin ;

Vujic, Miroslav ;

Jurkovic, Goran ;

Mandzuka, Sadko .

CYBERNETICS AND INFORMATION TECHNOLOGIES, 2015, 15 (05) :88-97

[32] Switching control of morphing aircraft based on Q-learning [J].

Ligang GONG ;

Qing WANG ;

Changhua HU ;

Chen LIU .

Chinese Journal of Aeronautics, 2020, 33 (02) :672-687

[33] Balance Control of Robot With CMAC Based Q-learning [J].

Li Ming-ai ;

Jiao Li-fang ;

Qiao Jun-fei ;

Ruan Xiao-gang .

2008 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-11, 2008, :2668-2672

[34] Inverse Value Iteration and Q-Learning: Algorithms, Stability, and Robustness [J].

Lian, Bosen ;

Xue, Wenqian ;

Lewis, Frank L. ;

Davoudi, Ali .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (04) :6970-6980

[35] Q-LEARNING BASED PREDICTIVE RELAY SELECTION FOR OPTIMAL RELAY BEAMFORMING [J].

Dimas, Anastasios ;

Diamantaras, Konstantinos ;

Petropulu, Athina P. .

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :5030-5034

[36] CVaR Q-Learning [J].

Stanko, Silvestr ;

Macek, Karel .

COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 :333-358

[37] Neural Q-learning [J].

Stephan ten Hagen ;

Ben Kröse .

Neural Computing & Applications, 2003, 12 :81-88

[38] Mutual Q-learning [J].

Reid, Cameron ;

Mukhopadhyay, Snehasis .

2020 3RD INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTS (ICCR 2020), 2020, :128-133

[39] Periodic Q-Learning [J].

Lee, Donghwan ;

He, Niao .

LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 :582-598

[40] Neural Q-learning [J].

ten Hagen, S ;

Kröse, B .

NEURAL COMPUTING & APPLICATIONS, 2003, 12 (02) :81-88

← 1 2 3 4 5 →