Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems

被引:75
作者
Palanisamy, Muthukumar [1 ,2 ]
Modares, Hamidreza [2 ]
Lewis, Frank L. [2 ]
Aurangzeb, Muhammad [2 ]
机构
[1] Gandhigram Rural Inst Deemed Univ, Dept Math, Gandhigram 624302, India
[2] Univ Texas Arlington Res Inst, Ft Worth, TX 76118 USA
基金
美国国家科学基金会;
关键词
Approximate dynamic programming (ADP); continuous-time dynamical systems; infinite-horizon discounted cost function; integral reinforcement learning (IRL); optimal control; Q-learning; value iteration (VI); ADAPTIVE OPTIMAL-CONTROL; ITERATION; SYSTEMS;
D O I
10.1109/TCYB.2014.2322116
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a method of Q-learning to solve the discounted linear quadratic regulator (LQR) problem for continuous-time (CT) continuous-state systems. Most available methods in the existing literature for CT systems to solve the LQR problem generally need partial or complete knowledge of the system dynamics. Q-learning is effective for unknown dynamical systems, but has generally been well understood only for discrete-time systems. The contribution of this paper is to present a Q-learning methodology for CT systems which solves the LQR problem without having any knowledge of the system dynamics. A natural and rigorous justified parameterization of the Q-function is given in terms of the state, the control input, and its derivatives. This parameterization allows the implementation of an online Q-learning algorithm for CT systems. The simulation results supporting the theoretical development are also presented.
引用
收藏
页码:165 / 176
页数:12
相关论文
共 38 条
[1]  
BAIRD LC, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P2448, DOI 10.1109/ICNN.1994.374604
[2]  
Bertsekas D. P., 1995, Dynamic programming and optimal control, V1
[3]  
Bianchi R. A. C., 2014, IEEE T CYBERNETICS, V44, P252
[4]  
Bradtke S. J., 1995, Advances in Neural Information Processing Systems 7, P393
[5]  
BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475
[6]  
Hairer Ernst, 2000, Solving Ordinary Differential Equations I, Nonstiff Problems
[7]   Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics [J].
Jiang, Yu ;
Jiang, Zhong-Ping .
AUTOMATICA, 2012, 48 (10) :2699-2704
[8]   Reinforcement learning: A survey [J].
Kaelbling, LP ;
Littman, ML ;
Moore, AW .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :237-285
[9]   Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics [J].
Kiumarsi, Bahare ;
Lewis, Frank L. ;
Modares, Hamidreza ;
Karimpour, Ali ;
Naghibi-Sistani, Mohammad-Bagher .
AUTOMATICA, 2014, 50 (04) :1167-1175
[10]   ON AN ITERATIVE TECHNIQUE FOR RICCATI EQUATION COMPUTATIONS [J].
KLEINMAN, DL .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1968, AC13 (01) :114-+