Reinforcement Learning based on MPC/MHE for Unmodeled and Partially Observable Dynamics

被引:14
作者
Esfahani, Hossein Nejatbakhsh [1 ]
Kordabad, Arash Bahari [1 ]
Gros, Sebastien [1 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway
来源
2021 AMERICAN CONTROL CONFERENCE (ACC) | 2021年
关键词
D O I
10.23919/ACC50511.2021.9483399
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes an observer-based framework for solving Partially Observable Markov Decision Processes (POMDPs) when an accurate model is not available. We first propose to use a Moving Horizon Estimation-Model Predictive Control (MHE-MPC) scheme in order to provide a policy for the POMDP problem, where the full state of the real process is not measured and necessarily known. We propose to parameterize both MPC and MHE formulations, where certain adjustable parameters are regarded for tuning the policy. In this paper, for the sake of tackling the unmodeled and partially observable dynamics, we leverage the Reinforcement Learning (RL) to tune the parameters of MPC and MHE schemes jointly, with the closed-loop performance of the policy as a goal rather than model fitting or the MHE performance. Illustrations show that the proposed approach can effectively increase the performance of close-loop control of systems formulated as POMDPs.
引用
收藏
页码:2121 / 2126
页数:6
相关论文
共 21 条
  • [1] Azizzadenesheli K., 2019, THESIS
  • [2] Bahari Kordabad A., 2021, 2021 AM CONTR C ACC
  • [3] Bertsekas D., 2005, Athena Scientific
  • [4] Büskens C, 2001, ONLINE OPTIMIZATION OF LARGE SCALE SYSTEMS, P3
  • [5] Gangwani T., 2019, UAI
  • [6] Reinforcement Learning for mixed-integer problems based on MPC
    Gros, Sebastien
    Zanon, Mario
    [J]. IFAC PAPERSONLINE, 2020, 53 (02): : 5219 - 5224
  • [7] Data-Driven Economic NMPC Using Reinforcement Learning
    Gros, Sebastien
    Zanon, Mario
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (02) : 636 - 648
  • [8] Guo Zhaohan Daniel, 2018, ARXIV181106407
  • [9] Data-Driven Distributed Output Consensus Control for Partially Observable Multiagent Systems
    Jiang, He
    He, Haibo
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (03) : 848 - 858
  • [10] Karg B., 2018, ARXIV180610644