Importance sampling policy gradient algorithms in reproducing kernel Hilbert space

被引：0

作者：

Tuyen Pham Le

Vien Anh Ngo

P. Marlith Jaramillo

TaeChoong Chung

机构：

[1] Kyung Hee University,Artificial Intelligence Lab, Computer Science and Engineering Department

[2] Queen’s University Belfast,EEECS/ECIT

来源：

Artificial Intelligence Review | 2019年 / 52卷

关键词：

Reproducing kernel Hilbert space; Policy search; Reinforcement learning; Importance sampling; Policy gradient; Non-parametric;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Modeling policies in reproducing kernel Hilbert space (RKHS) offers a very flexible and powerful new family of policy gradient algorithms called RKHS policy gradient algorithms. They are designed to optimize over a space of very high or infinite dimensional policies. As a matter of fact, they are known to suffer from a large variance problem. This critical issue comes from the fact that updating the current policy is based on a functional gradient that does not exploit all old episodes sampled by previous policies. In this paper, we introduce a generalized RKHS policy gradient algorithm that integrates the following important ideas: (i) policy modeling in RKHS; (ii) normalized importance sampling, which helps reduce the estimation variance by reusing previously sampled episodes in a principled way; and (iii) regularization terms, which avoid updating the policy too over-fit to sampled data. In the experiment section, we provide an analysis of the proposed algorithms through bench-marking domains. The experiment results show that the proposed algorithm can still enjoy a powerful policy modeling in RKHS and achieve more data-efficiency.

引用

页码：2039 / 2059

页数：20

共 41 条

[1]

Baxter J(2001)Experiments with infinite-horizon, policy-gradient estimation J Artif Intell Res 15 351-381

[2]

Bartlett PL(2013)The qrsim quadrotors simulator RN 13 08-142

[3]

Weaver L(2013)A survey on policy search for robotics Found Trends Robot 2 1-1410

[4]

De Nardi R(2009)Adaptive importance sampling for value function approximation in off-policy reinforcement learning Neural Netw 22 1399-1220

[5]

Deisenroth M(2008)Kernel methods in machine learning Ann Stat 3 1171-203

[6]

Neumann G(2011)Policy search for motor primitives in robotics Mach Learn 84 171-379

[7]

Peters J(2012)Reinforcement learning to adjust parametrized motor primitives to new situations Auton Robots 33 361-204

[8]

Hachiya H(2005)On learning vector-valued functions Neural Comput 17 177-128

[9]

Akiyama T(2013)A tour of modern image filtering: new insights and methods, both practical and theoretical IEEE Signal Process Mag 30 106-697

[10]

Sugiayma M(2008)Reinforcement learning of motor skills with policy gradients Neural Netw 21 682-697

← 1 2 3 4 5 →