Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling

被引:23
作者
Sattar, Hosnieh [1 ,2 ]
Bulling, Andreas [1 ]
Fritz, Mario [2 ]
机构
[1] Max Planck Inst Informat, Perceptual User Interfaces Grp, Saarbrucken, Germany
[2] Max Planck Inst Informat, Scalable Learning & Percept Grp, Saarbrucken, Germany
来源
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017) | 2017年
关键词
D O I
10.1109/ICCVW.2017.322
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting the target of visual search from human gaze data is a challenging problem. In contrast to previous work that focused on predicting specific instances of search targets, we propose the first approach to predict a target's category and attributes. However, state-of-the-art models for categorical recognition require large amounts of training data, which is prohibitive for gaze data. We thus propose a novel Gaze Pooling Layer that integrates gaze information and CNN-based features by an attention mechanism - incorporating both spatial and temporal aspects of gaze behaviour. We show that our approach can leverage pre-trained CNN architectures, thus eliminating the need for expensive joint data collection of image and gaze data. We demonstrate the effectiveness of our method on a new 14 participant dataset, and indicate directions for future research in the gaze-based prediction of mental states.
引用
收藏
页码:2740 / 2748
页数:9
相关论文
共 35 条
[11]  
[Anonymous], ACM MM
[12]  
[Anonymous], ICCV
[13]  
[Anonymous], 2016, P IEEE C COMP VIS PA
[14]  
[Anonymous], NEUROCOMPUTING
[15]  
[Anonymous], PSYCHOL B
[16]  
[Anonymous], 2015, CVPR
[17]  
[Anonymous], P IEEE INT C COMP VI
[18]  
[Anonymous], UBICOMP
[19]  
[Anonymous], CVPR
[20]  
[Anonymous], INT J COMPUTER VISIO