Model-based reinforcement learning for approximate optimal regulation

被引:156
作者
Kamalapurkar, Rushikesh [1 ]
Walters, Patrick [1 ]
Dixon, Warren E. [1 ]
机构
[1] Univ Florida, Dept Mech & Aerosp Engn, Gainesville, FL USA
基金
美国国家科学基金会;
关键词
Model-based reinforcement learning; Concurrent learning; Simulated experience; Data-based control; Adaptive control; System identification; OPTIMAL TRACKING CONTROL; DISCRETE-TIME-SYSTEMS; ZERO-SUM GAMES;
D O I
10.1016/j.automatica.2015.10.039
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning (RL)-based online approximate optimal control methods applied to deterministic systems typically require a restrictive persistence of excitation (PE) condition for convergence. This paper develops a concurrent learning (CL)-based implementation of model-based RL to solve approximate optimal regulation problems online under a PE-like rank condition. The development is based on the observation that, given a model of the system, RL can be implemented by evaluating the Bellman error at any number of desired points in the state space. In this result, a parametric system model is considered, and a CL-based parameter identifier is developed to compensate for uncertainty in the parameters. Uniformly ultimately bounded regulation of the system states to a neighborhood of the origin, and convergence of the developed policy to a neighborhood of the optimal policy are established using a Lyapunov-based analysis, and simulation results are presented to demonstrate the performance of the developed controller. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:94 / 104
页数:11
相关论文
共 48 条
[1]   Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].
Al-Tamimi, Asma ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
AUTOMATICA, 2007, 43 (03) :473-481
[2]   Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].
Al-Tamimi, Asma ;
Lewis, Frank .
2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, :38-+
[3]  
[Anonymous], 2006, P 23 INT C MACH LEAR, DOI DOI 10.1145/1143844.1143845
[4]  
Atkeson C. G., 1997, P INT C MACH LEARN, V97, P12, DOI DOI 10.1007/978-3-7091-6874-5_4
[5]   A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].
Bhasin, S. ;
Kamalapurkar, R. ;
Johnson, M. ;
Vamvoudakis, K. G. ;
Lewis, F. L. ;
Dixon, W. E. .
AUTOMATICA, 2013, 49 (01) :82-92
[6]   Robust Identification-Based State Derivative Estimation for Nonlinear Systems [J].
Bhasin, Shubhendu ;
Kamalapurkar, Rushikesh ;
Dinh, Huyen T. ;
Dixon, Warren E. .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2013, 58 (01) :187-192
[7]   Concurrent learning adaptive control of linear systems with exponentially convergent bounds [J].
Chowdhary, Girish ;
Yucelen, Tansel ;
Muehlegg, Maximillian ;
Johnson, Eric N. .
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2013, 27 (04) :280-301
[8]   Concurrent Learning for Convergence in Adaptive Control without Persistency of Excitation [J].
Chowdhary, Girish ;
Johnson, Eric .
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3674-3679
[9]   Theory and Flight-Test Validation of a Concurrent-Learning Adaptive Controller [J].
Chowdhary, Girish V. ;
Johnson, Eric N. .
JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2011, 34 (02) :592-607
[10]  
Deisenroth M., 2011, P 28 INT C MACH LEAR, P465, DOI DOI 10.5555/3104482.3104541