Unsupervised Curricula for Visual Meta-Reinforcement Learning

被引:0
作者
Jabri, Allan [1 ]
Hsu, Kyle [2 ]
Eysenbach, Benjamin [3 ]
Gupta, Abhishek [1 ]
Levine, Sergey [1 ]
Finn, Chelsea [4 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94704 USA
[2] Univ Toronto, Toronto, ON, Canada
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[4] Stanford Univ, Stanford, CA 94305 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In principle, meta-reinforcement learning algorithms leverage experience across many tasks to learn fast reinforcement learning (RL) strategies that transfer to similar tasks. However, current meta-RL approaches rely on manually-defined distributions of training tasks, and hand-crafting these task distributions can be challenging and time-consuming. Can "useful" pre-training tasks be discovered in an unsupervised manner? We develop an unsupervised algorithm for inducing an adaptive meta-training task distribution, i.e. an automatic curriculum, by modeling unsupervised interaction in a visual environment. The task distribution is scaffolded by a parametric density model of the meta-learner's trajectory distribution. We formulate unsupervised meta-RL as information maximization between a latent task variable and the meta-learner's data distribution, and describe a practical instantiation which alternates between integration of recent experience into the task distribution and meta-learning of the updated tasks. Repeating this procedure leads to iterative reorganization such that the curriculum adapts as the meta-learner's data distribution shifts. In particular, we show how discriminative clustering for visual representation can support trajectory-level task acquisition and exploration in domains with pixel observations, avoiding pitfalls of alternatives. In experiments on vision-based navigation and manipulation domains, we show that the algorithm allows for unsupervised meta-leaming that transfers to downstream tasks specified by hand-crafted reward functions and serves as pre-training for more efficient supervised meta-learning of test task distributions.
引用
收藏
页数:13
相关论文
共 65 条
[21]  
Florensa Carlos, 2017, C ROBOT LEARNING, P482
[22]  
Florensa Carlos, 2017, INT C MACH LEARN ICM
[23]  
Forestier S., 2017, CoRR
[24]  
Fu Justin, 2017, NEURAL INFORM PROCES
[25]  
Graves A, 2017, PR MACH LEARN RES, V70
[26]  
Gregor K., 2016, ARXIV161107507
[27]  
Gupta A., 2018, arXiv
[28]  
Gupta Abhishek, 2018, NEURAL INFORM PROCES
[29]  
Hadfield-Menell D, 2017, ADV NEUR IN, V30
[30]  
Hastie T., 2009, Springer Series in Statistics, DOI DOI 10.1007/B94608