Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming

被引:3
|
作者
Bertsekas, Dimitri P. [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85287 USA
来源
IFAC PAPERSONLINE | 2024年 / 58卷 / 18期
关键词
Model Predictive Control; Adaptive Control; Dynamic Programming; Reinforcement Learning; Newton's Method; DISCRETE-TIME-SYSTEMS; NEWTON METHODS; REACHABILITY; ALGORITHMS; STABILITY; GAME; SETS; GO;
D O I
10.1016/j.ifacol.2024.09.056
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call them the off-line training and the on-line play algorithms. The names are borrowed from some of the major successes of RL involving games; primary examples are the recent (2017) AlphaZero program (which plays chess, [SHS17], [SSS17]), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon, [Tes94], [Tes95], [TeG96]). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line play algorithm is the method used to play in real time against human or computer opponents. Significantly, the synergy between off-line training and on-line play also underlies MPC (as well as other major classes of sequential decision problems), and indeed the MPC design architecture is very similar to the one of AlphaZero and TD-Gammon. This conceptual insight provides a vehicle for bridging the cultural gap between RL and MPC, and sheds new light on some fundamental issues in MPC. These include the enhancement of stability properties through rollout, the treatment of uncertainty through the use of certainty equivalence, the resilience of MPC in adaptive control settings that involve changing system parameters, and the insights provided by the superlinear performance bounds implied by Newton's method. Copyright (C) 2024 The Authors.
引用
收藏
页码:363 / 383
页数:21
相关论文
共 50 条
  • [1] Adaptive parameterized model predictive control based on reinforcement learning: A synthesis framework
    Sun, Dingshan
    Jamshidnejad, Anahita
    De Schutter, Bart
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 136
  • [2] Model Predictive Control-Based Reinforcement Learning
    Han, Qiang
    Boussaid, Farid
    Bennamoun, Mohammed
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [3] A unified framework to control estimation error in reinforcement learning
    Zhang, Yujia
    Li, Lin
    Wei, Wei
    Lv, Yunpeng
    Liang, Jiye
    NEURAL NETWORKS, 2024, 178
  • [4] Dynamic programming and model predictive control
    Meadows, ES
    PROCEEDINGS OF THE 1997 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 1997, : 1635 - 1639
  • [5] Model Predictive Control and Dynamic Programming
    Lee, Jay H.
    2011 11TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2011, : 1807 - 1809
  • [6] Model Predictive Control of Quadruped Robot Based on Reinforcement Learning
    Zhang, Zhitong
    Chang, Xu
    Ma, Hongxu
    An, Honglei
    Lang, Lin
    APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [7] A reinforcement learning-based economic model predictive control framework for autonomous operation of chemical reactors
    Alhazmi, Khalid
    Albalawi, Fahad
    Sarathy, S. Mani
    CHEMICAL ENGINEERING JOURNAL, 2022, 428
  • [8] An Improved Reinforcement Learning Based Heuristic Dynamic Programming Algorithm for Model-Free Optimal Control
    Li, Jia
    Yuan, Zhaolin
    Ban, Xiaojuan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 282 - 294
  • [9] Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control
    Lewis, Frank L.
    Vrabie, Draguna
    IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2009, 9 (03) : 32 - 50
  • [10] A Unified Framework for Optimality Analysis of Model Predictive Control
    Cai, Xin
    Li, Shaoyuan
    Li, Ning
    Li, Kang
    2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 1688 - 1693