Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming

被引：3

作者：

Bertsekas, Dimitri P. ^{[1
]}

机构：

[1] Arizona State Univ, Tempe, AZ 85287 USA

来源：

IFAC PAPERSONLINE | 2024年 / 58卷 / 18期

关键词：

Model Predictive Control; Adaptive Control; Dynamic Programming; Reinforcement Learning; Newton's Method; DISCRETE-TIME-SYSTEMS; NEWTON METHODS; REACHABILITY; ALGORITHMS; STABILITY; GAME; SETS; GO;

D O I：

10.1016/j.ifacol.2024.09.056

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call them the off-line training and the on-line play algorithms. The names are borrowed from some of the major successes of RL involving games; primary examples are the recent (2017) AlphaZero program (which plays chess, [SHS17], [SSS17]), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon, [Tes94], [Tes95], [TeG96]). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line play algorithm is the method used to play in real time against human or computer opponents. Significantly, the synergy between off-line training and on-line play also underlies MPC (as well as other major classes of sequential decision problems), and indeed the MPC design architecture is very similar to the one of AlphaZero and TD-Gammon. This conceptual insight provides a vehicle for bridging the cultural gap between RL and MPC, and sheds new light on some fundamental issues in MPC. These include the enhancement of stability properties through rollout, the treatment of uncertainty through the use of certainty equivalence, the resilience of MPC in adaptive control settings that involve changing system parameters, and the insights provided by the superlinear performance bounds implied by Newton's method. Copyright (C) 2024 The Authors.

引用

页码：363 / 383

页数：21

共 50 条

[1] Adaptive parameterized model predictive control based on reinforcement learning: A synthesis framework
Sun, Dingshan
Jamshidnejad, Anahita
De Schutter, Bart
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 136
[2] Model Predictive Control-Based Reinforcement Learning
Han, Qiang
Boussaid, Farid
Bennamoun, Mohammed
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[3] A unified framework to control estimation error in reinforcement learning
Zhang, Yujia
Li, Lin
Wei, Wei
Lv, Yunpeng
Liang, Jiye
NEURAL NETWORKS, 2024, 178
[4] Dynamic programming and model predictive control
Meadows, ES
PROCEEDINGS OF THE 1997 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 1997, : 1635 - 1639
[5] Model Predictive Control and Dynamic Programming
Lee, Jay H.
2011 11TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2011, : 1807 - 1809
[6] Model Predictive Control of Quadruped Robot Based on Reinforcement Learning
Zhang, Zhitong
Chang, Xu
Ma, Hongxu
An, Honglei
Lang, Lin
APPLIED SCIENCES-BASEL, 2023, 13 (01):
[7] A reinforcement learning-based economic model predictive control framework for autonomous operation of chemical reactors
Alhazmi, Khalid
Albalawi, Fahad
Sarathy, S. Mani
CHEMICAL ENGINEERING JOURNAL, 2022, 428
[8] An Improved Reinforcement Learning Based Heuristic Dynamic Programming Algorithm for Model-Free Optimal Control
Li, Jia
Yuan, Zhaolin
Ban, Xiaojuan
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 282 - 294
[9] Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control
Lewis, Frank L.
Vrabie, Draguna
IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2009, 9 (03) : 32 - 50
[10] A Unified Framework for Optimality Analysis of Model Predictive Control
Cai, Xin
Li, Shaoyuan
Li, Ning
Li, Kang
2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 1688 - 1693

← 1 2 3 4 5 →