Experience Selection in Deep Reinforcement Learning for Control

被引:0
作者
de Bruin, Tim [1 ]
Kober, Jens [1 ]
Tuyls, Karl [2 ,3 ]
Babuska, Robert [1 ]
机构
[1] Delft Univ Technol, Cognit Robot Dept, Mekelweg 2, NL-2628 CD Delft, Netherlands
[2] Deepmind, 14 Rue Londres, F-75009 Paris, France
[3] Univ Liverpool, Dept Comp Sci, Ashton St, Liverpool L69 3BX, Merseyside, England
关键词
reinforcement learning; deep learning; experience replay; control; robotics; GAME; GO;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Experience replay is a technique that allows off-policy reinforcement-learning methods to reuse past experiences. The stability and speed of convergence of reinforcement learning, as well as the eventual performance of the learned policy, are strongly dependent on the experiences being replayed. Which experiences are replayed depends on two important choices. The first is which and how many experiences to retain in the experience replay buffer. The second choice is how to sample the experiences that are to be replayed from that buffer. We propose new methods for the combined problem of experience retention and experience sampling. We refer to the combination as experience selection. We focus our investigation specifically on the control of physical systems, such as robots, where exploration is costly. To determine which experiences to keep and which to replay, we investigate different proxies for their immediate and long-term utility. These proxies include age, temporal difference error and the strength of the applied exploration noise. Since no currently available method works in all situations, we propose guidelines for using prior knowledge about, the characteristics of the control problem at hand to choose the appropriate experience replay strategy.
引用
收藏
页数:56
相关论文
共 66 条
[1]   Policy derivation methods for critic -only reinforcement learning in continuous spaces [J].
Alibekov, Eduard ;
Kubalik, Jiri ;
Babuska, Robert .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 69 :178-187
[2]  
Andre David, 1997, P 10 INT C NEURAL IN, P1001
[3]  
[Anonymous], OPENAI ROBOSCHOOL
[4]  
[Anonymous], IFAC WORLD C
[5]  
[Anonymous], 1992, BREAKTHROUGHS STAT
[6]  
[Anonymous], INT C INT ROB SYST I
[7]  
[Anonymous], 2015, CORRABS151202011
[8]  
[Anonymous], 1998, DIGITAL CONTROL DYNA
[9]  
[Anonymous], 2017, OpenAI Baselines
[10]  
[Anonymous], 2016, INT C LEARNING REPRE