Actor-Critic Algorithm with Transition Cost Estimation

被引：0

作者：

Sergey, Denisov ^{[1
]}

Lee, Jee-Hyong ^{[1
]}

机构：

[1] Sungkyunkwan Univ, Dept Elect & Comp Engn, Suwon, South Korea

来源：

INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS | 2016年 / 16卷 / 04期

基金：

新加坡国家研究基金会;

关键词：

Actor-critic algorithm; Reinforcement learning; Continuous action space; Heuristic function;

D O I：

10.5391/IJFIS.2016.16.4.270

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We present an approach for acceleration actor-critic algorithm for reinforcement learning with continuous action space. Actor-critic algorithm has already proved its robustness to the infinitely large action spaces in various high dimensional environments. Despite that success, the main problem of the actor-critic algorithm remains the same-speed of convergence to the optimal policy. In high dimensional state and action space, a searching for the correct action in each state takes enormously long time. Therefore, in this paper we suggest a search accelerating function that allows to leverage speed of algorithm convergence and reach optimal policy faster. In our method, we assume that actions may have their own distribution of preference, that independent on the state. Since in the beginning of learning agent act randomly in the environment, it would be more efficient if actions were taken according to the some heuristic function. We demonstrate that heuristically-accelerated actor-critic algorithm learns optimal policy faster, using Educational Process Mining dataset with records of students' course learning process and their grades.

引用

页码：270 / 275

页数：6

共 13 条

[1] Heuristically-Accelerated Multiagent Reinforcement Learning [J].

Bianchi, Reinaldo A. C. ;

Martins, Murilo F. ;

Ribeiro, Carlos H. C. ;

Costa, Anna H. R. .

IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (02) :252-265

[2]

Celiberto LA, 2008, LECT NOTES COMPUT SC, V5001, P220

[3]

Ernst D, 2005, J MACH LEARN RES, V6, P503

[4]

Hunt J.J., CONTINUOUS CONTROL D

[5]

Mnih V., PLAYING ATARI DEEP R

[6] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

[7]

Riedmiller M, 2005, LECT NOTES ARTIF INT, V3720, P317, DOI 10.1007/11564096_32

[8]

Silver D., 2014, ICML ICML 14, P387

[9]

Sutton RS, 1996, ADV NEUR IN, V8, P1038

[10]

Sutton RS, 2000, ADV NEUR IN, V12, P1057

← 1 2 →