Policy Gradient Reinforcement Learning Method for Discrete-Time Linear Quadratic Regulation Problem Using Estimated State Value Function

被引：0

作者：

Sasaki, Tomotake ^{[1
,2
]}

Uchibe, Eiji ^{[3
,4
]}

Iwane, Hidenao ^{[1
,5
]}

Yanami, Hitoshi ^{[1
]}

Anai, Hirokazu ^{[1
,6
]}

Doya, Kenji ^{[4
]}

机构：

[1] Fujitsu Labs Ltd, Artificial Intelligence Platform Project, Kawasaki, Kanagawa, Japan

[2] MIT, McGovern Inst Brain Res, Cambridge, MA 02139 USA

[3] Adv Telecommun Res Inst Int, Dept Brain Robot Interface, Kyoto, Japan

[4] Grad Univ, Neural Computat Unit, Okinawa Inst Sci & Technol, Onna, Okinawa, Japan

[5] Natl Inst Informat, Org Management & Outside Collaborat R&D, Tokyo, Japan

[6] Kyushu Univ, Inst Math Ind, Fukuoka, Japan

来源：

2017 56TH ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS OF JAPAN (SICE) | 2017年

关键词：

reinforcement learning; adaptive control; policy gradient method; linear quadratic regulation problem;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a policy gradient reinforcement learning method which directly estimates the gradient of the state value function (V-function) with respect to a feedback coefficient matrix using measurable data and uses it for policy improvement. The proposed method can be applicable to the case where the state-action value function (Qfunction) is difficult to estimate, and can update the policy in an effective direction for reducing the accumulated cost.

引用

页码：653 / 657

页数：5

共 11 条

[1]

[Anonymous], 2014, 31 INT C MACH LEARN

[2]

[Anonymous], 2009, MATRIX MATH

[3]

Bradtke S. J., 1995, THESIS

[4]

BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475

[5]

Dann C, 2014, J MACH LEARN RES, V15, P809

[6]

Deisenroth M. P., 2013, Foundations and Trends in Robotics, V2, P1, DOI 10.1561/2300000021

[7] Reinforcement Learning and Feedback Control USING NATURAL DECISION METHODS TO DESIGN OPTIMAL ADAPTIVE CONTROLLERS [J].

Lewis, Frank L. ;

Vrabie, Draguna ;

Vamvoudakis, Kyriakos G. .

IEEE CONTROL SYSTEMS MAGAZINE, 2012, 32 (06) :76-105

[8]

Sutton R.S., Reinforcement learning: An introduction

[9]

van Hasselt H, 2012, ADAPT LEARN OPTIM, V12, P207

[10]

Zhou K., 1996, Robust and optimal control

← 1 2 →