Visual Search Target Inference in Natural Interaction Settings with Machine Learning

被引:17
作者
Barz, Michael [1 ,3 ]
Stauden, Sven [2 ]
Sonntag, Daniel [1 ]
机构
[1] German Res Ctr Artificial Intelligence DFKI, Saarbrucken, Germany
[2] Saarland Univ, Saarbrucken, Germany
[3] Saarbrucken Grad Sch Comp Sci, Saarbrucken, Germany
来源
ETRA'20 FULL PAPERS: ACM SYMPOSIUM ON EYE TRACKING RESEARCH AND APPLICATIONS | 2020年
关键词
Mobile Eyetracking; Visual Attention; Search Target Inference; Machine Learning; EYE-MOVEMENTS; REVEAL;
D O I
10.1145/3379155.3391314
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Visual search is a perceptual task in which humans aim at identifying a search target object such as a traffic sign among other objects. Search target inference subsumes computational methods for predicting this target by tracking and analyzing overt behavioral cues of that person, e.g., the human gaze and fixated visual stimuli. We present a generic approach to inferring search targets in natural scenes by predicting the class of the surrounding image segment. Our method encodes visual search sequences as histograms of fixated segment classes determined by SegNet, a deep learning image segmentation model for natural scenes. We compare our sequence encoding and model training (SVM) to a recent baseline from the literature for predicting the target segment. Also, we use a new search target inference dataset. The results show that, first, our new segmentation-based sequence encoding outperforms the method from the literature, and second, that it enables target inference in natural settings.
引用
收藏
页数:8
相关论文
共 38 条
[21]  
Oviatt Sharon, 2018, HDB MULTIMODAL MULTI, V2, DOI [10.1145/3107990, DOI 10.1145/3107990]
[22]   CNN Features off-the-shelf: an Astounding Baseline for Recognition [J].
Razavian, Ali Sharif ;
Azizpour, Hossein ;
Sullivan, Josephine ;
Carlsson, Stefan .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2014, :512-519
[23]   Task and context determine where you look [J].
Rothkopf, Constantin A. ;
Ballard, Dana H. ;
Hayhoe, Mary M. .
JOURNAL OF VISION, 2007, 7 (14)
[24]   Eye movements when observing predictable and unpredictable actions [J].
Rotman, Gerben ;
Troje, Nikolaus F. ;
Johansson, Roland S. ;
Flanagan, J. Randall .
JOURNAL OF NEUROPHYSIOLOGY, 2006, 96 (03) :1358-1369
[25]   ImageNet Large Scale Visual Recognition Challenge [J].
Russakovsky, Olga ;
Deng, Jia ;
Su, Hao ;
Krause, Jonathan ;
Satheesh, Sanjeev ;
Ma, Sean ;
Huang, Zhiheng ;
Karpathy, Andrej ;
Khosla, Aditya ;
Bernstein, Michael ;
Berg, Alexander C. ;
Fei-Fei, Li .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 115 (03) :211-252
[26]   Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling [J].
Sattar, Hosnieh ;
Bulling, Andreas ;
Fritz, Mario .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, :2740-2748
[27]  
Sattar H, 2015, PROC CVPR IEEE, P981, DOI 10.1109/CVPR.2015.7298700
[28]  
Sattar Hosnieh, 2017, ARXIVABS170605993 CO, P9
[29]  
Sharon Oviatt, 2017, The Handbook of MultimodalMultisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations, V1, DOI [10.1145/3015783, DOI 10.1145/3015783]
[30]  
Song SR, 2015, PROC CVPR IEEE, P567, DOI 10.1109/CVPR.2015.7298655