Importance sampling policy gradient algorithms in reproducing kernel Hilbert space

被引:0
作者
Tuyen Pham Le
Vien Anh Ngo
P. Marlith Jaramillo
TaeChoong Chung
机构
[1] Kyung Hee University,Artificial Intelligence Lab, Computer Science and Engineering Department
[2] Queen’s University Belfast,EEECS/ECIT
来源
Artificial Intelligence Review | 2019年 / 52卷
关键词
Reproducing kernel Hilbert space; Policy search; Reinforcement learning; Importance sampling; Policy gradient; Non-parametric;
D O I
暂无
中图分类号
学科分类号
摘要
Modeling policies in reproducing kernel Hilbert space (RKHS) offers a very flexible and powerful new family of policy gradient algorithms called RKHS policy gradient algorithms. They are designed to optimize over a space of very high or infinite dimensional policies. As a matter of fact, they are known to suffer from a large variance problem. This critical issue comes from the fact that updating the current policy is based on a functional gradient that does not exploit all old episodes sampled by previous policies. In this paper, we introduce a generalized RKHS policy gradient algorithm that integrates the following important ideas: (i) policy modeling in RKHS; (ii) normalized importance sampling, which helps reduce the estimation variance by reusing previously sampled episodes in a principled way; and (iii) regularization terms, which avoid updating the policy too over-fit to sampled data. In the experiment section, we provide an analysis of the proposed algorithms through bench-marking domains. The experiment results show that the proposed algorithm can still enjoy a powerful policy modeling in RKHS and achieve more data-efficiency.
引用
收藏
页码:2039 / 2059
页数:20
相关论文
共 41 条
[1]  
Baxter J(2001)Experiments with infinite-horizon, policy-gradient estimation J Artif Intell Res 15 351-381
[2]  
Bartlett PL(2013)The qrsim quadrotors simulator RN 13 08-142
[3]  
Weaver L(2013)A survey on policy search for robotics Found Trends Robot 2 1-1410
[4]  
De Nardi R(2009)Adaptive importance sampling for value function approximation in off-policy reinforcement learning Neural Netw 22 1399-1220
[5]  
Deisenroth M(2008)Kernel methods in machine learning Ann Stat 3 1171-203
[6]  
Neumann G(2011)Policy search for motor primitives in robotics Mach Learn 84 171-379
[7]  
Peters J(2012)Reinforcement learning to adjust parametrized motor primitives to new situations Auton Robots 33 361-204
[8]  
Hachiya H(2005)On learning vector-valued functions Neural Comput 17 177-128
[9]  
Akiyama T(2013)A tour of modern image filtering: new insights and methods, both practical and theoretical IEEE Signal Process Mag 30 106-697
[10]  
Sugiayma M(2008)Reinforcement learning of motor skills with policy gradients Neural Netw 21 682-697