Reinforcement Learning for Linear Continuous-time Systems: an Incremental Learning Approach

被引:22
作者
Bian, Tao [1 ]
Jiang, Zhong-Ping [2 ]
机构
[1] Bank Amer Merrill Lynch, One Bryant Pk, New York, NY 10036 USA
[2] NYU, Tandon Sch Engn, Dept Elect & Comp Engn, Control & Networks Lab, 5 Metrotech Ctr, Brooklyn, NY 11201 USA
基金
美国国家科学基金会;
关键词
Adaptive optimal control; robust dynamic programming; value iteration (VI); ADAPTIVE OPTIMAL-CONTROL; STABILIZATION; STATE;
D O I
10.1109/JAS.2019.1911390
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we introduce a novel reinforcement learning (RL) scheme for linear continuous-time dynamical systems. Different from traditional batch learning algorithms, an incremental learning approach is developed, which provides a more efficient way to tackle the on-line learning problem in real-world applications. We provide concrete convergence and robust analysis on this incremental-learning algorithm. An extension to solving robust optimal control problems is also given. Two simulation examples are also given to illustrate the effectiveness of our theoretical result.
引用
收藏
页码:433 / 440
页数:8
相关论文
共 40 条
[1]  
BAIRD LC, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P2448, DOI 10.1109/ICNN.1994.374604
[2]  
Barto AG, 2017, P 18 YAL WORKSH AD L
[3]   Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations [J].
Bertsekas, Dimitri P. .
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2019, 6 (01) :1-31
[4]   Natural actor-critic algorithms [J].
Bhatnagar, Shalabh ;
Sutton, Richard S. ;
Ghavamzadeh, Mohammad ;
Lee, Mark .
AUTOMATICA, 2009, 45 (11) :2471-2482
[5]  
Bian T., 2018, ARXIV E PRINTS
[6]   Stochastic and adaptive optimal control of uncertain interconnected systems: A data-driven approach [J].
Bian, Tao ;
Jiang, Zhong-Ping .
SYSTEMS & CONTROL LETTERS, 2018, 115 :48-54
[7]   Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise [J].
Bian, Tao ;
Jiang, Yu ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4170-4175
[8]   Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design [J].
Bian, Tao ;
Jiang, Zhong-Ping .
AUTOMATICA, 2016, 71 :348-360
[9]   Adaptive dynamic programming and optimal control of nonlinear nonaffine systems [J].
Bian, Tao ;
Jiang, Yu ;
Jiang, Zhong-Ping .
AUTOMATICA, 2014, 50 (10) :2624-2632
[10]   Reinforcement learning in continuous time and space [J].
Doya, K .
NEURAL COMPUTATION, 2000, 12 (01) :219-245