Adversarial Goal Generation for Intrinsic Motivation

被引:0
作者
Durugkar, Ishan [1 ]
Stone, Peter [1 ]
机构
[1] Univ Texas Austin, 2317 Speedway,Stop D9500, Austin, TX 78712 USA
来源
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generally in Reinforcement Learning the goal, or reward signal, is given by the environment and cannot be controlled by the agent. We propose to introduce an intrinsic motivation module that will select a reward function for the agent to learn to achieve. We will use a Universal Value Function Approximator (Schaul et al. 2015), that takes as input both the state and the parameters of this reward function as the goal to predict the value function (or action-value function) to generalize across these goals. This module will be trained to generate goals such that the agent's learning is maximized. Thus, this is also a method for automatic curriculum learning.
引用
收藏
页码:8073 / 8074
页数:2
相关论文
共 15 条
[1]  
Andrychowicz M., 2017, ADV NEURAL INF PROCE
[2]  
[Anonymous], 2004, Advances in Neural Information Processing Systems, DOI DOI 10.21236/ADA440280
[3]  
[Anonymous], 2017, ARXIV170700183
[4]  
[Anonymous], ARXIV170506366
[5]  
[Anonymous], 2017, CoRR
[6]  
[Anonymous], 1998, REINFORCEMENT LEARNI
[7]  
Barto Andrew G, 2005, Proceedings of the Thirteenth Yale Workshop on Adaptive and Learning Systems, P113
[8]  
Cabi S., 2017, ARXIV170703300
[9]  
Graves Alex, 2017, ARXIV170403003
[10]   Human-level control through deep reinforcement learning [J].
Mnih, Volodymyr ;
Kavukcuoglu, Koray ;
Silver, David ;
Rusu, Andrei A. ;
Veness, Joel ;
Bellemare, Marc G. ;
Graves, Alex ;
Riedmiller, Martin ;
Fidjeland, Andreas K. ;
Ostrovski, Georg ;
Petersen, Stig ;
Beattie, Charles ;
Sadik, Amir ;
Antonoglou, Ioannis ;
King, Helen ;
Kumaran, Dharshan ;
Wierstra, Daan ;
Legg, Shane ;
Hassabis, Demis .
NATURE, 2015, 518 (7540) :529-533