Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

被引：0

作者：

Oh, Junhyuk ^{[1
]}

Singh, Satinder ^{[1
]}

Lee, Honglak ^{[1
,2
]}

Kohli, Pushmeet ^{[3
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] Google Brain, Mountain View, CA USA

[3] Microsoft Res, Mountain View, CA USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70 | 2017年 / 70卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks. In this problem, we consider two types of generalizations: to previously unseen instructions and to longer sequences of instructions. For generalization over unseen instructions, we propose a new objective which encourages learning correspondences between similar subtasks by making analogies. For generalization over sequential instructions, we present a hierarchical architecture where a meta controller learns to use the acquired skills for executing the instructions. To deal with delayed reward, we propose a new neural architecture in the meta controller that learns when to update the subtask, which makes learning more efficient. Experimental results on a stochastic 3D domain show that the proposed ideas are crucial for generalization to longer instructions as well as unseen instructions.Y

引用

页数：10

共 40 条

[1] Andre D., 2002, AAAI IAAI
[2] Andre David., 2000, NIPS
[3] Andreas J., 2016, ARXIV161101796
[4] [Anonymous], 2002, THESIS U MASSACHUSET
[5] [Anonymous], 2016, 4 INT C LEARN REPR I
[6] [Anonymous], 2017, ICLR
[7] [Anonymous], 2015, ABS150500521 CORR
[8] Bacon PL, 2017, AAAI CONF ARTIF INTE, P1726
[9] Branavan S.R.K., 2009, ACL IJCNLP
[10] Chen D. L., 2011, P AAAI C ARTIFICIAL

← 1 2 3 4 →