A stabilizing reinforcement learning approach for sampled systems with partially unknown models

被引:0
作者
Beckenbach, Lukas [1 ]
Osinenko, Pavel [2 ]
Streif, Stefan [1 ]
机构
[1] Tech Univ Chemnitz, Automatic Control & Dynam Syst Lab, Chemnitz, Germany
[2] Skolkovo Inst Sci & Technol, Digital Engn Ctr, Moscow, Russia
关键词
adaptive control; approximate dynamic programming; optimal control; reinforcement learning; TIME NONLINEAR-SYSTEMS; APPROXIMATE OPTIMAL-CONTROL; CONVERSE LYAPUNOV THEOREM; ADAPTIVE OPTIMAL-CONTROL; ONLINE OPTIMAL-CONTROL; INPUT; ALGORITHM; SAFE; MPC;
D O I
10.1002/rnc.7626
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning is commonly associated with training of reward-maximizing (or cost-minimizing) agents, in other words, controllers. It can be applied in model-free or model-based fashion, using a priori or online collected system data to train involved parametric architectures. In general, online reinforcement learning does not guarantee closed loop stability unless special measures are taken, for instance, through learning constraints or tailored training rules. Particularly promising are hybrids of reinforcement learning with classical control approaches. In this work, we suggest a method to guarantee practical stability of the system-controller closed loop in a purely online learning setting, in other words, without offline training. Moreover, we assume only partial knowledge of the system model. To achieve the claimed results, we employ techniques of classical adaptive control. The implementation of the overall control scheme is provided explicitly in a digital, sampled setting. That is, the controller receives the state of the system and computes the control action at discrete, specifically, equidistant moments in time. The method is tested in adaptive traction control and cruise control where it proved to significantly reduce the cost.
引用
收藏
页码:12389 / 12412
页数:24
相关论文
共 77 条
  • [1] Ames AD, 2014, IEEE DECIS CONTR P, P6271, DOI 10.1109/CDC.2014.7040372
  • [2] Baier R., NUMERICAL COMPUTATIO
  • [3] LINEAR PROGRAMMING BASED LYAPUNOV FUNCTION COMPUTATION FOR DIFFERENTIAL INCLUSIONS
    Baier, Robert
    Gruene, Lars
    Hafstein, Sigurdur Freyr
    [J]. DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS-SERIES B, 2012, 17 (01): : 33 - 56
  • [4] LEARNING TO ACT USING REAL-TIME DYNAMIC-PROGRAMMING
    BARTO, AG
    BRADTKE, SJ
    SINGH, SP
    [J]. ARTIFICIAL INTELLIGENCE, 1995, 72 (1-2) : 81 - 138
  • [5] Barto AG, 1991, 9157 U MASS COMP SCI
  • [6] Barto AG., 2004, P IEEE INT JOINT C N
  • [7] Beckenbach L, 2020, 2020 EUROPEAN CONTROL CONFERENCE (ECC 2020), P184
  • [8] Berkenkamp F, 2017, ADV NEUR IN, V30
  • [9] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming
    Bertsekas, Dimitri P.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) : 500 - 509
  • [10] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
    Bhasin, S.
    Kamalapurkar, R.
    Johnson, M.
    Vamvoudakis, K. G.
    Lewis, F. L.
    Dixon, W. E.
    [J]. AUTOMATICA, 2013, 49 (01) : 82 - 92