Prioritized Sampling with Intrinsic Motivation in Multi-Task Reinforcement Learning

被引:1
作者
D'Eramo, Carlo [1 ]
Chalvatzaki, Georgia [1 ]
机构
[1] Tech Univ Darmstadt, Comp Sci Dept, Darmstadt, Germany
来源
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年
关键词
reinforcement learning; multi-task; active sampling; intrinsic motivation;
D O I
10.1109/IJCNN55064.2022.9892973
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep Reinforcement Learning (RL) promises to lead the next advances towards the development of coveted future intelligent agents. However, the unprecedented representational power of deep function approximators, e.g. deep neural networks, comes at the cost of demanding a huge amount of experience, making deep RL impractical for applications requiring interactions with the real world. We study the problem of making use of samples in deep RL more efficiently, exploiting the desirable properties of knowledge generalization resulting from learning multiple tasks together. The outcome of our work is the coupling of multi-task RL algorithms with a task-sampling policy based on the well-known intrinsic motivation paradigm. In particular, we leverage on the notion of TD-error of Bellman updates, as an effective measure of learning progress, to prioritize sampling from the tasks contributing the most to the learning of the agent. This sampling strategy speeds up the learning of tasks for which the agent is showing progress, and postpones the learning of the remaining ones, resulting in an optimized collection of samples. Our method is supported by experimental evaluations on well-known RL control tasks, for which our approach shows superior sample-efficiency and performance compared to representative baselines. We eventually evaluate our approach on simulated control tasks based on Quanser robotics systems, confirming the advantages over the baselines also in more realistic applications.
引用
收藏
页数:8
相关论文
共 46 条
[1]  
[Anonymous], 2018, INT C MACH LEARN
[2]  
[Anonymous], 2018, INT C LEARN REPR 201, DOI DOI 10.1159/000492809
[3]  
[Anonymous], 2011, Regularization in reinforcement learning
[4]  
[Anonymous], 2016, Openai gym
[5]  
[Anonymous], 2003, J. Mach. Learn. Res.
[6]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[7]   Active learning of inverse models with intrinsically motivated goal exploration in robots [J].
Baranes, Adrien ;
Oudeyer, Pierre-Yves .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2013, 61 (01) :49-73
[8]  
Bellemare MG, 2016, ADV NEUR IN, V29
[9]   THE THEORY OF DYNAMIC PROGRAMMING [J].
BELLMAN, R .
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1954, 60 (06) :503-515
[10]  
Belousov B., SIMULATORS REAL ROBO