Teacher-Student Curriculum Learning

被引：168

作者：

Matiisen, Tambet ^{[1
,2
]}

Oliver, Avital ^{[3
]}

Cohen, Taco ^{[4
]}

Schulman, John ^{[2
]}

机构：

[1] Univ Tartu, Inst Comp Sci, EE-51005 Tartu, Estonia

[2] OpenAI, San Francisco, CA 94110 USA

[3] Google Brain, Amsterdam, Netherlands

[4] Univ Amsterdam, Inst Informat, NL-1012 WX Amsterdam, Netherlands

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2020年 / 31卷 / 09期

关键词：

Task analysis; Training; Reinforcement learning; Supervised learning; Robots; Navigation; Active learning; curriculum learning; deep reinforcement learning; learning progress;

D O I：

10.1109/TNNLS.2019.2934906

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose Teacher-Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task, and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e., where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student's performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with long short-term memory (LSTM) and navigation in Minecraft. Our automatically ordered curriculum of submazes enabled to solve a Minecraft maze that could not be solved at all when training directly on that maze, and the learning was an order of magnitude faster than a uniform sampling of those submazes.

引用

页码：3732 / 3740

页数：9

共 41 条

[1]

[Anonymous], 2015, CORR

[2]

[Anonymous], 2015, J. Mach. Learn. Res.

[3]

[Anonymous], 2015, ICLR

[4]

[Anonymous], ARXIV170506366

[5]

[Anonymous], ARXIV150701526

[6]

[Anonymous], 2015, Tech. Rep.

[7]

Babaeizadeh M., 2016, Reinforcement learning through asynchronous advantage actorcritic on a GPU

[8] Active learning of inverse models with intrinsically motivated goal exploration in robots [J].

Baranes, Adrien ;

Oudeyer, Pierre-Yves .

ROBOTICS AND AUTONOMOUS SYSTEMS, 2013, 61 (01) :49-73

[9]

Bellemare M. G., 2016, P 30 INT C NEUR INF, P1479

[10] A MARKOVIAN DECISION PROCESS [J].

BELLMAN, R .

JOURNAL OF MATHEMATICS AND MECHANICS, 1957, 6 (05) :679-684

← 1 2 3 4 5 →