Policy Gradient Reinforcement Learning Method for Discrete-Time Linear Quadratic Regulation Problem Using Estimated State Value Function

被引:0
作者
Sasaki, Tomotake [1 ,2 ]
Uchibe, Eiji [3 ,4 ]
Iwane, Hidenao [1 ,5 ]
Yanami, Hitoshi [1 ]
Anai, Hirokazu [1 ,6 ]
Doya, Kenji [4 ]
机构
[1] Fujitsu Labs Ltd, Artificial Intelligence Platform Project, Kawasaki, Kanagawa, Japan
[2] MIT, McGovern Inst Brain Res, Cambridge, MA 02139 USA
[3] Adv Telecommun Res Inst Int, Dept Brain Robot Interface, Kyoto, Japan
[4] Grad Univ, Neural Computat Unit, Okinawa Inst Sci & Technol, Onna, Okinawa, Japan
[5] Natl Inst Informat, Org Management & Outside Collaborat R&D, Tokyo, Japan
[6] Kyushu Univ, Inst Math Ind, Fukuoka, Japan
来源
2017 56TH ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS OF JAPAN (SICE) | 2017年
关键词
reinforcement learning; adaptive control; policy gradient method; linear quadratic regulation problem;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a policy gradient reinforcement learning method which directly estimates the gradient of the state value function (V-function) with respect to a feedback coefficient matrix using measurable data and uses it for policy improvement. The proposed method can be applicable to the case where the state-action value function (Qfunction) is difficult to estimate, and can update the policy in an effective direction for reducing the accumulated cost.
引用
收藏
页码:653 / 657
页数:5
相关论文
共 11 条
[1]  
[Anonymous], 2014, 31 INT C MACH LEARN
[2]  
[Anonymous], 2009, MATRIX MATH
[3]  
Bradtke S. J., 1995, THESIS
[4]  
BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475
[5]  
Dann C, 2014, J MACH LEARN RES, V15, P809
[6]  
Deisenroth M. P., 2013, Foundations and Trends in Robotics, V2, P1, DOI 10.1561/2300000021
[7]   Reinforcement Learning and Feedback Control USING NATURAL DECISION METHODS TO DESIGN OPTIMAL ADAPTIVE CONTROLLERS [J].
Lewis, Frank L. ;
Vrabie, Draguna ;
Vamvoudakis, Kyriakos G. .
IEEE CONTROL SYSTEMS MAGAZINE, 2012, 32 (06) :76-105
[8]  
Sutton R.S., Reinforcement learning: An introduction
[9]  
van Hasselt H, 2012, ADAPT LEARN OPTIM, V12, P207
[10]  
Zhou K., 1996, Robust and optimal control