共 41 条
[1]
Baxter J(2001)Experiments with infinite-horizon, policy-gradient estimation J Artif Intell Res 15 351-381
[2]
Bartlett PL(2013)The qrsim quadrotors simulator RN 13 08-142
[3]
Weaver L(2013)A survey on policy search for robotics Found Trends Robot 2 1-1410
[4]
De Nardi R(2009)Adaptive importance sampling for value function approximation in off-policy reinforcement learning Neural Netw 22 1399-1220
[5]
Deisenroth M(2008)Kernel methods in machine learning Ann Stat 3 1171-203
[6]
Neumann G(2011)Policy search for motor primitives in robotics Mach Learn 84 171-379
[7]
Peters J(2012)Reinforcement learning to adjust parametrized motor primitives to new situations Auton Robots 33 361-204
[8]
Hachiya H(2005)On learning vector-valued functions Neural Comput 17 177-128
[9]
Akiyama T(2013)A tour of modern image filtering: new insights and methods, both practical and theoretical IEEE Signal Process Mag 30 106-697
[10]
Sugiayma M(2008)Reinforcement learning of motor skills with policy gradients Neural Netw 21 682-697