Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

被引:0
作者
Oh, Junhyuk [1 ]
Singh, Satinder [1 ]
Lee, Honglak [1 ,2 ]
Kohli, Pushmeet [3 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Google Brain, Mountain View, CA USA
[3] Microsoft Res, Mountain View, CA USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70 | 2017年 / 70卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks. In this problem, we consider two types of generalizations: to previously unseen instructions and to longer sequences of instructions. For generalization over unseen instructions, we propose a new objective which encourages learning correspondences between similar subtasks by making analogies. For generalization over sequential instructions, we present a hierarchical architecture where a meta controller learns to use the acquired skills for executing the instructions. To deal with delayed reward, we propose a new neural architecture in the meta controller that learns when to update the subtask, which makes learning more efficient. Experimental results on a stochastic 3D domain show that the proposed ideas are crucial for generalization to longer instructions as well as unseen instructions.Y
引用
收藏
页数:10
相关论文
共 40 条
  • [1] Andre D., 2002, AAAI IAAI
  • [2] Andre David., 2000, NIPS
  • [3] Andreas J., 2016, ARXIV161101796
  • [4] [Anonymous], 2002, THESIS U MASSACHUSET
  • [5] [Anonymous], 2016, 4 INT C LEARN REPR I
  • [6] [Anonymous], 2017, ICLR
  • [7] [Anonymous], 2015, ABS150500521 CORR
  • [8] Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726
  • [9] Branavan S.R.K., 2009, ACL IJCNLP
  • [10] Chen D. L., 2011, P AAAI C ARTIFICIAL