Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach

被引：22

作者：

Bian, Tao ^{[1
]}

Jiang, Zhong-Ping ^{[2
]}

机构：

[1] Bank Amer Merrill Lynch, One Bryant Pk, New York, NY 10036 USA

[2] NYU, Tandon Sch Engn, Dept Elect & Comp Engn, Control & Networks Lab, 5 Metrotech Ctr, Brooklyn, NY 11201 USA

来源：

IEEE-CAA JOURNAL OF AUTOMATICA SINICA | 2019年 / 6卷 / 02期

基金：

美国国家科学基金会;

关键词：

Adaptive optimal control; robust dynamic programming; value iteration (VI); ADAPTIVE OPTIMAL-CONTROL; STABILIZATION; STATE;

D O I：

10.1109/JAS.2019.1911390

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we introduce a novel reinforcement learning (RL) scheme for linear continuous-time dynamical systems. Different from traditional batch learning algorithms, an incremental learning approach is developed, which provides a more efficient way to tackle the on-line learning problem in real-world applications. We provide concrete convergence and robust analysis on this incremental-learning algorithm. An extension to solving robust optimal control problems is also given. Two simulation examples are also given to illustrate the effectiveness of our theoretical result.

引用

页码：433 / 440

页数：8

共 40 条

[1]

BAIRD LC, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P2448, DOI 10.1109/ICNN.1994.374604

[2]

Barto AG, 2017, P 18 YAL WORKSH AD L

[3] Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations [J].

Bertsekas, Dimitri P. .

IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (01) :1-31

[4] Natural actor-critic algorithms [J].

Bhatnagar, Shalabh ;

Sutton, Richard S. ;

Ghavamzadeh, Mohammad ;

Lee, Mark .

AUTOMATICA, 2009, 45 (11) :2471-2482

[5]

Bian T., 2018, ARXIV E PRINTS

[6] Stochastic and adaptive optimal control of uncertain interconnected systems: A data-driven approach [J].

Bian, Tao ;

Jiang, Zhong-Ping .

SYSTEMS & CONTROL LETTERS, 2018, 115 :48-54

[7] Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise [J].

Bian, Tao ;

Jiang, Yu ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4170-4175

[8] Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design [J].

Bian, Tao ;

Jiang, Zhong-Ping .

AUTOMATICA, 2016, 71 :348-360

[9] Adaptive dynamic programming and optimal control of nonlinear nonaffine systems [J].

Bian, Tao ;

Jiang, Yu ;

Jiang, Zhong-Ping .

AUTOMATICA, 2014, 50 (10) :2624-2632

[10] Reinforcement learning in continuous time and space [J].

Doya, K .

NEURAL COMPUTATION, 2000, 12 (01) :219-245

← 1 2 3 4 →