Learning-Based Predictive Control for Discrete-Time Nonlinear Systems With Stochastic Disturbances

被引:63
作者
Xu, Xin [1 ]
Chen, Hong [2 ,3 ]
Lian, Chuanqiang [4 ]
Li, Dazi [5 ]
机构
[1] Natl Univ Def Technol, Coll Intelligence Sci, Changsha 410073, Hunan, Peoples R China
[2] Jilin Univ NanLing, State Key Lab Automot Simulat & Control, Changchun 130025, Jilin, Peoples R China
[3] Jilin Univ NanLing, Dept Control Sci & Engn, Changchun 130025, Jilin, Peoples R China
[4] Naval Univ Engn, Natl Key Lab Sci & Technol Vessel Integrated Powe, Wuhan 430032, Hubei, Peoples R China
[5] Beijing Univ Chem Technol, Dept Automat, Beijing 100029, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming (ADP); function approximation; model predictive control (MPC); optimal control; receding horizon; reinforcement learning (RL); H-INFINITY CONTROL; POLICY ITERATION; CONTROL SCHEME;
D O I
10.1109/TNNLS.2018.2820019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a learning-based predictive control (LPC) scheme is proposed for adaptive optimal control of discrete-time nonlinear systems under stochastic disturbances. The proposed LPC scheme is different from conventional model predictive control (MPC), which uses open-loop optimization or simplified closed-loop optimal control techniques in each horizon. In LPC, the control task in each horizon is formulated as a closed-loop nonlinear optimal control problem and a finite-horizon iterative reinforcement learning (RL) algorithm is developed to obtain the closed-loop optimal/suboptimal solutions. Therefore, in LPC, RL and adaptive dynamic programming ( ADP) are used as a new class of closed-loop learning-based optimization techniques for nonlinear predictive control with stochastic disturbances. Moreover, LPC also decomposes the infinite-horizon optimal control problem in previous RL and ADP methods into a series of finite horizon problems, so that the computational costs are reduced and the learning efficiency can be improved. Convergence of the finite-horizon iterative RL algorithm in each prediction horizon and the Lyapunov stability of the closed-loop control system are proved. Moreover, by using successive policy updates between adjoint time horizons, LPC also has lower computational costs than conventional MPC which has independent optimization procedures between two different prediction horizons. Simulation results illustrate that compared with conventional nonlinear MPC as well as ADP, the proposed LPC scheme can obtain a better performance both in terms of policy optimality and computational efficiency.
引用
收藏
页码:6202 / 6213
页数:12
相关论文
共 41 条
  • [1] Abdulla Mohammed Shahid, 2007, 2007 American Control Conference, P534, DOI 10.1109/ACC.2007.4282587
  • [2] [Anonymous], 2004, THESIS
  • [3] [Anonymous], 1992, HDB INTELLIGENT CONT
  • [4] [Anonymous], 2010, Algorithms for Reinforcement Learning
  • [5] Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
  • [6] Dynamic programming and suboptimal control: A survey from ADP to MPC
    Bertsekas, DP
    [J]. EUROPEAN JOURNAL OF CONTROL, 2005, 11 (4-5) : 310 - 334
  • [7] Natural actor-critic algorithms
    Bhatnagar, Shalabh
    Sutton, Richard S.
    Ghavamzadeh, Mohammad
    Lee, Mark
    [J]. AUTOMATICA, 2009, 45 (11) : 2471 - 2482
  • [8] A Probabilistic Particle-Control Approximation of Chance-Constrained Stochastic Predictive Control
    Blackmore, Lars
    Ono, Masahiro
    Bektassov, Askar
    Williams, Brian C.
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2010, 26 (03) : 502 - 517
  • [9] A quasi-infinite horizon nonlinear model predictive control scheme with guaranteed stability
    Chen, H
    Allgower, F
    [J]. AUTOMATICA, 1998, 34 (10) : 1205 - 1217
  • [10] A feasible moving horizon H∞ control scheme for constrained uncertain linear systems
    Chen, Hong
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2007, 52 (02) : 343 - 348