Learning to Teach Reinforcement Learning Agents

被引:35
作者
Fachantidis, Anestis [1 ]
Taylor, Matthew [2 ]
Vlahavas, Ioannis [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
[2] Univ Alberta, Borealis AI, CCIS 3-232, Edmonton, AB T6G 2M9, Canada
关键词
machine learning; reinforcement learning; transfer learning; action advice; machine teaching;
D O I
10.3390/make1010002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student's intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.
引用
收藏
页码:21 / 42
页数:22
相关论文
共 24 条
[1]  
Amir O., 2016, P INT JOINT C ART IN
[2]  
[Anonymous], 2013, P 2013 INT C AUT AG
[3]  
[Anonymous], 1996, THESIS
[4]  
Chakraborty D., 2006, P 5 INT JOINT C AUT, P691
[5]  
da Silva FL, 2017, AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, P1100
[7]   Learning domain structure through probabilistic policy reuse in reinforcement learning [J].
Fernandez, Fernando ;
Veloso, Manuela .
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2013, 2 (01) :13-27
[8]  
Holzinger A., ARXIV170801104
[10]  
Rohlfshagen P, 2011, IEEE C EVOL COMPUTAT, P70