Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling

被引：23

作者：

Sattar, Hosnieh ^{[1
,2
]}

Bulling, Andreas ^{[1
]}

Fritz, Mario ^{[2
]}

机构：

[1] Max Planck Inst Informat, Perceptual User Interfaces Grp, Saarbrucken, Germany

[2] Max Planck Inst Informat, Scalable Learning & Percept Grp, Saarbrucken, Germany

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017) | 2017年

关键词：

D O I：

10.1109/ICCVW.2017.322

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predicting the target of visual search from human gaze data is a challenging problem. In contrast to previous work that focused on predicting specific instances of search targets, we propose the first approach to predict a target's category and attributes. However, state-of-the-art models for categorical recognition require large amounts of training data, which is prohibitive for gaze data. We thus propose a novel Gaze Pooling Layer that integrates gaze information and CNN-based features by an attention mechanism - incorporating both spatial and temporal aspects of gaze behaviour. We show that our approach can leverage pre-trained CNN architectures, thus eliminating the need for expensive joint data collection of image and gaze data. We demonstrate the effectiveness of our method on a new 14 participant dataset, and indicate directions for future research in the gaze-based prediction of mental states.

引用

页码：2740 / 2748

页数：9