Selective eye-gaze augmentation to enhance imitation learning in Atari games

被引:7
作者
Thammineni, Chaitanya [1 ]
Manjunatha, Hemanth [1 ]
Esfahani, Ehsan T. [1 ]
机构
[1] Univ Buffalo, Human Loop Syst Lab, Buffalo, NY 14260 USA
关键词
Imitation learning; Human-in-the-loop learning; Learning by demonstration; MOVEMENTS; ATTENTION; SALIENCY;
D O I
10.1007/s00521-021-06367-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents the selective use of eye-gaze information in learning human actions in Atari games. Extensive evidence suggests that our eye movements convey a wealth of information about the direction of our attention and mental states and encode the information necessary to complete a task. Based on this evidence, we hypothesize that selective use of eye-gaze, as a clue for attention direction, will enhance the learning from demonstration. For this purpose, we propose a selective eye-gaze augmentation (SEA) network that learns when to use the eye-gaze information. The proposed network architecture consists of three sub-networks: gaze prediction, gating, and action prediction network. Using the prior 4 game frames, a gaze map is predicted by the gaze prediction network, which is used for augmenting the input frame. The gating network will determine whether the predicted gaze map should be used in learning and is fed to the final network to predict the action at the current frame. To validate this approach, we use publicly available Atari Human Eye-Tracking And Demonstration (Atari-HEAD) dataset consists of 20 Atari games with 28 million human demonstrations and 328 million eye-gazes (over game frames) collected from four subjects. We demonstrate the efficacy of selective eye-gaze augmentation compared to the state-of-the-art Attention Guided Imitation Learning (AGIL) and Behavior Cloning (BC). The results indicate that the selective augmentation approach (the SEA network) performs significantly better than the AGIL and BC. Moreover, to demonstrate the significance of selective use of gaze through the gating network, we compare our approach with the random selection of the gaze. Even in this case, the SEA network performs significantly better, validating the advantage of selectively using the gaze in demonstration learning.
引用
收藏
页码:23401 / 23410
页数:10
相关论文
共 30 条
[1]   Dynamic causal modelling of eye movements during pursuit: Confirming precision-encoding in V1 using MEG [J].
Adams, Rick A. ;
Bauer, Markus ;
Pinotsis, Dimitris ;
Friston, Karl J. .
NEUROIMAGE, 2016, 132 :175-189
[2]   Processing of Eye/Head-Tracking Data in Large-Scale Naturalistic Driving Data Sets [J].
Ahlstrom, Christer ;
Victor, Trent ;
Wege, Claudia ;
Steinmetz, Erik .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2012, 13 (02) :553-564
[3]  
Barrett M, 2018, Proceedings of the 22nd conference on computational natural language learning, P302, DOI DOI 10.18653/V1/K18-1030
[4]   The Arcade Learning Environment: An Evaluation Platform for General Agents [J].
Bellemare, Marc G. ;
Naddaf, Yavar ;
Veness, Joel ;
Bowling, Michael .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 :253-279
[5]   The neuroscience of grasping [J].
Castiello, U .
NATURE REVIEWS NEUROSCIENCE, 2005, 6 (09) :726-736
[6]   Robot Navigation in Crowds by Graph Convolutional Networks With Attention Learned From Human Gaze [J].
Chen, Yuying ;
Liu, Congcong ;
Shi, Bertram E. ;
Liu, Ming .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) :2754-2761
[7]  
Chen YY, 2019, IEEE INT C INT ROBOT, P7756, DOI [10.1109/IROS40897.2019.8967843, 10.1109/iros40897.2019.8967843]
[8]  
Chung J., 2014, NIPS WORKSH DEEP LEA
[9]   Cortical mechanisms of action selection: the affordance competition hypothesis [J].
Cisek, Paul .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2007, 362 (1485) :1585-1599
[10]   Action plans used in action observation [J].
Flanagan, JR ;
Johansson, RS .
NATURE, 2003, 424 (6950) :769-771