Reinforcement Learning-Based Model Predictive Control for Discrete-Time Systems

被引：18

作者：

Lin, Min ^{[1
]}

Sun, Zhongqi ^{[1
,2
]}

Xia, Yuanqing ^{[1
]}

Zhang, Jinhui ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China

[2] Beijing Inst Technol, Yangtze Delta Reg Acad, Jiaxing 314019, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Discrete-time systems; model predictive control; policy iteration (PI); reinforcement learning (RL); CONSTRAINED MPC; STABILITY;

D O I：

10.1109/TNNLS.2023.3273590

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article proposes a novel reinforcement learning-based model predictive control (RLMPC) scheme for discrete-time systems. The scheme integrates model predictive control (MPC) and reinforcement learning (RL) through policy iteration (PI), where MPC is a policy generator and the RL technique is employed to evaluate the policy. Then the obtained value function is taken as the terminal cost of MPC, thus improving the generated policy. The advantage of doing so is that it rules out the need for the offline design paradigm of the terminal cost, the auxiliary controller, and the terminal constraint in traditional MPC. Moreover, RLMPC proposed in this article enables a more flexible choice of prediction horizon due to the elimination of the terminal constraint, which has great potential in reducing the computational burden. We provide a rigorous analysis of the convergence, feasibility, and stability properties of RLMPC. Simulation results show that RLMPC achieves nearly the same performance as traditional MPC in the control of linear systems and exhibits superiority over traditional MPC for nonlinear ones.

引用

页码：3312 / 3324

页数：13

共 47 条

[1] ON THE STABILITY OF RECEDING HORIZON CONTROL OF NONLINEAR DISCRETE-TIME-SYSTEMS [J].

ALAMIR, M ;

BORNARD, G .

SYSTEMS & CONTROL LETTERS, 1994, 23 (04) :291-296

[2]

Berkenkamp F, 2017, ADV NEUR IN, V30

[3] AXIOMATIZATIONS OF AVERAGE AND A FURTHER GENERALIZATION OF MONOTONIC SEQUENCES [J].

BIBBY, J .

GLASGOW MATHEMATICAL JOURNAL, 1974, 15 (MAR) :63-65

[4] Stability and feasibility of state constrained MPC without stabilizing terminal constraints [J].

Boccia, Andrea ;

Gruene, Lars ;

Worthmann, Karl .

SYSTEMS & CONTROL LETTERS, 2014, 72 :14-21

[5]

Brockett R.W, 1983, PROG MATH, P181

[6] ON RECEDING HORIZON FEEDBACK-CONTROL [J].

CHEN, CC ;

SHAW, L .

AUTOMATICA, 1982, 18 (03) :349-352

[7] A quasi-infinite horizon nonlinear model predictive control scheme with guaranteed stability [J].

Chen, H ;

Allgower, F .

AUTOMATICA, 1998, 34 (10) :1205-1217

[8] Q-Learning: Theory and Applications [J].

Clifton, Jesse ;

Laber, Eric .

ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 7, 2020, 2020, 7 :279-301

[9] MPC: Current practice and challenges [J].

Darby, Mark L. ;

Nikolaou, Michael .

CONTROL ENGINEERING PRACTICE, 2012, 20 (04) :328-342

[10] Gaussian Processes for Data-Efficient Learning in Robotics and Control [J].

Deisenroth, Marc Peter ;

Fox, Dieter ;

Rasmussen, Carl Edward .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (02) :408-423

← 1 2 3 4 5 →