Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling

被引:23
作者
Sattar, Hosnieh [1 ,2 ]
Bulling, Andreas [1 ]
Fritz, Mario [2 ]
机构
[1] Max Planck Inst Informat, Perceptual User Interfaces Grp, Saarbrucken, Germany
[2] Max Planck Inst Informat, Scalable Learning & Percept Grp, Saarbrucken, Germany
来源
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017) | 2017年
关键词
D O I
10.1109/ICCVW.2017.322
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting the target of visual search from human gaze data is a challenging problem. In contrast to previous work that focused on predicting specific instances of search targets, we propose the first approach to predict a target's category and attributes. However, state-of-the-art models for categorical recognition require large amounts of training data, which is prohibitive for gaze data. We thus propose a novel Gaze Pooling Layer that integrates gaze information and CNN-based features by an attention mechanism - incorporating both spatial and temporal aspects of gaze behaviour. We show that our approach can leverage pre-trained CNN architectures, thus eliminating the need for expensive joint data collection of image and gaze data. We demonstrate the effectiveness of our method on a new 14 participant dataset, and indicate directions for future research in the gaze-based prediction of mental states.
引用
收藏
页码:2740 / 2748
页数:9
相关论文
共 35 条
[1]  
[Anonymous], MENTAL SEARCH IMAGE
[2]  
[Anonymous], ICCV
[3]  
[Anonymous], ACM MM
[4]  
[Anonymous], ARXIV150505753CSCV
[5]  
[Anonymous], ADV LEARNING BEHAV D
[6]  
[Anonymous], 2016, CVPR
[7]  
[Anonymous], 2014, ARXIV14120100
[8]  
[Anonymous], ETRA
[9]  
[Anonymous], 2016, CVPR
[10]  
[Anonymous], 2015, CVPR