Model-based reinforcement learning for approximate optimal regulation

被引：156

作者：

Kamalapurkar, Rushikesh ^{[1
]}

Walters, Patrick ^{[1
]}

Dixon, Warren E. ^{[1
]}

机构：

[1] Univ Florida, Dept Mech & Aerosp Engn, Gainesville, FL USA

来源：

AUTOMATICA | 2016年 / 64卷

基金：

美国国家科学基金会;

关键词：

Model-based reinforcement learning; Concurrent learning; Simulated experience; Data-based control; Adaptive control; System identification; OPTIMAL TRACKING CONTROL; DISCRETE-TIME-SYSTEMS; ZERO-SUM GAMES;

D O I：

10.1016/j.automatica.2015.10.039

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning (RL)-based online approximate optimal control methods applied to deterministic systems typically require a restrictive persistence of excitation (PE) condition for convergence. This paper develops a concurrent learning (CL)-based implementation of model-based RL to solve approximate optimal regulation problems online under a PE-like rank condition. The development is based on the observation that, given a model of the system, RL can be implemented by evaluating the Bellman error at any number of desired points in the state space. In this result, a parametric system model is considered, and a CL-based parameter identifier is developed to compensate for uncertainty in the parameters. Uniformly ultimately bounded regulation of the system states to a neighborhood of the origin, and convergence of the developed policy to a neighborhood of the optimal policy are established using a Lyapunov-based analysis, and simulation results are presented to demonstrate the performance of the developed controller. (C) 2015 Elsevier Ltd. All rights reserved.

引用

页码：94 / 104

页数：11

共 48 条

[1] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[2] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].

Al-Tamimi, Asma ;

Lewis, Frank .

2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, :38-+

[3]

[Anonymous], 2006, P 23 INT C MACH LEAR, DOI DOI 10.1145/1143844.1143845

[4]

Atkeson C. G., 1997, P INT C MACH LEARN, V97, P12, DOI DOI 10.1007/978-3-7091-6874-5_4

[5] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].

Bhasin, S. ;

Kamalapurkar, R. ;

Johnson, M. ;

Vamvoudakis, K. G. ;

Lewis, F. L. ;

Dixon, W. E. .

AUTOMATICA, 2013, 49 (01) :82-92

[6] Robust Identification-Based State Derivative Estimation for Nonlinear Systems [J].

Bhasin, Shubhendu ;

Kamalapurkar, Rushikesh ;

Dinh, Huyen T. ;

Dixon, Warren E. .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2013, 58 (01) :187-192

[7] Concurrent learning adaptive control of linear systems with exponentially convergent bounds [J].

Chowdhary, Girish ;

Yucelen, Tansel ;

Muehlegg, Maximillian ;

Johnson, Eric N. .

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2013, 27 (04) :280-301

[8] Concurrent Learning for Convergence in Adaptive Control without Persistency of Excitation [J].

Chowdhary, Girish ;

Johnson, Eric .

49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :3674-3679

[9] Theory and Flight-Test Validation of a Concurrent-Learning Adaptive Controller [J].

Chowdhary, Girish V. ;

Johnson, Eric N. .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2011, 34 (02) :592-607

[10]

Deisenroth M., 2011, P 28 INT C MACH LEAR, P465, DOI DOI 10.5555/3104482.3104541

← 1 2 3 4 5 →