A Q-learning predictive control scheme with guaranteed stability

被引:9
作者
Beckenbach, Lukas [1 ]
Osinenko, Pavel [1 ]
Streif, Stefan [1 ]
机构
[1] Tech Univ Chemnitz, Automat Control & Syst Dynam Lab, D-09107 Chemnitz, Germany
关键词
Predictive control; Q-Learning; Cost shaping; Nominal stability; RECEDING-HORIZON CONTROL; DISCRETE-TIME-SYSTEMS; NONLINEAR-SYSTEMS; FINITE; PERFORMANCE; MPC; STATE;
D O I
10.1016/j.ejcon.2020.03.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Model-based predictive controllers are used to tackle control tasks in which constraints on state, input or both need to be satisfied. These controllers commonly optimize a fixed finite-horizon cost, which relates to an infinite-horizon (IH) cost profile, while the resulting closed-loop under the predictive controller yields an in general suboptimal IH cost. To capture the optimal IH cost and the associated control policy, reinforcement learning methods, such as Q-learning, that approximate said cost via a parametric architec-ture can be employed. Conversely to predictive controllers, however, closed-loop stability has rarely been investigated under the approximation associated controller in explicit dependence of these parameters. It is the aim of this work to incorporate model-based Q-learning into a predictive control setup as to provide closed-loop stability in online learning, while eventually improving the performance of finite-horizon controllers. The proposed scheme provides nominal asymptotic stability and the observation was made that the suggested learning approach could in fact improve the performance against a baseline predictive controller. (c) 2020 European Control Association. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:167 / 178
页数:12
相关论文
共 50 条
[31]   Ramp Metering Control Based on the Q-Learning Algorithm [J].
Ivanjko, Edouard ;
Necoska, Daniela Koltovska ;
Greguric, Martin ;
Vujic, Miroslav ;
Jurkovic, Goran ;
Mandzuka, Sadko .
CYBERNETICS AND INFORMATION TECHNOLOGIES, 2015, 15 (05) :88-97
[32]   Switching control of morphing aircraft based on Q-learning [J].
Ligang GONG ;
Qing WANG ;
Changhua HU ;
Chen LIU .
Chinese Journal of Aeronautics, 2020, 33 (02) :672-687
[33]   Balance Control of Robot With CMAC Based Q-learning [J].
Li Ming-ai ;
Jiao Li-fang ;
Qiao Jun-fei ;
Ruan Xiao-gang .
2008 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-11, 2008, :2668-2672
[34]   Inverse Value Iteration and Q-Learning: Algorithms, Stability, and Robustness [J].
Lian, Bosen ;
Xue, Wenqian ;
Lewis, Frank L. ;
Davoudi, Ali .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (04) :6970-6980
[35]   Q-LEARNING BASED PREDICTIVE RELAY SELECTION FOR OPTIMAL RELAY BEAMFORMING [J].
Dimas, Anastasios ;
Diamantaras, Konstantinos ;
Petropulu, Athina P. .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :5030-5034
[36]   CVaR Q-Learning [J].
Stanko, Silvestr ;
Macek, Karel .
COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 :333-358
[37]   Neural Q-learning [J].
Stephan ten Hagen ;
Ben Kröse .
Neural Computing & Applications, 2003, 12 :81-88
[38]   Mutual Q-learning [J].
Reid, Cameron ;
Mukhopadhyay, Snehasis .
2020 3RD INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTS (ICCR 2020), 2020, :128-133
[39]   Periodic Q-Learning [J].
Lee, Donghwan ;
He, Niao .
LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 :582-598
[40]   Neural Q-learning [J].
ten Hagen, S ;
Kröse, B .
NEURAL COMPUTING & APPLICATIONS, 2003, 12 (02) :81-88