What stands out in a scene? A study of human explicit saliency judgment

被引:115
作者
Borji, Ali [1 ]
Sihite, Dicky N. [1 ]
Itti, Laurent [1 ,2 ,3 ]
机构
[1] Univ So Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
[2] Univ So Calif, Neurosci Grad Program, Los Angeles, CA 90089 USA
[3] Univ So Calif, Dept Psychol, Los Angeles, CA 90089 USA
基金
美国国家科学基金会;
关键词
Explicit saliency judgment; Space-based attention; Eye movements; Bottom-up saliency; Free viewing; Object-based attention; SELECTIVE VISUAL-ATTENTION; EYE-MOVEMENTS; SEARCH; MODEL; GUIDANCE; MECHANISMS; ALLOCATION; LOCATIONS; FEATURES; OBJECTS;
D O I
10.1016/j.visres.2013.07.016
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Eye tracking has become the de facto standard measure of visual attention in tasks that range from free viewing to complex daily activities. In particular, saliency models are often evaluated by their ability to predict human gaze patterns. However, fixations are not only influenced by bottom-up saliency (computed by the models), but also by many top-down factors. Thus, comparing bottom-up saliency maps to eye fixations is challenging and has required that one tries to minimize top-down influences, for example by focusing on early fixations on a stimulus. Here we propose two complementary procedures to evaluate visual saliency. We seek whether humans have explicit and conscious access to the saliency computations believed, to contribute to guiding attention and eye movements. In the first experiment, 70 observers were asked to choose which object stands out the most based on its low-level features in 100 images each containing only two objects. Using several state-of-the-art bottom-up visual saliency models that measure local and global spatial image outliers, we show that maximum saliency inside the selected object is significantly higher than inside the non-selected object and the background. Thus spatial outliers are a predictor of human judgments. Performance of this predictor is boosted by including object size as an additional feature. In the second experiment, observers were asked to draw a polygon circumscribing the most salient object in cluttered scenes. For each of 120 images, we show that a map built from annotations of 70 observers explains eye fixations of another 20 observers freely viewing the images, significantly above chance (dataset by Bruce and Tsotsos (2009); shuffled AUC score 0.62 +/- 0.07, chance 0.50, t-test p < 0.05). We conclude that fixations agree with saliency judgments, and classic bottom-up saliency models explain both. We further find that computational models specifically designed for fixation prediction slightly outperform models designed for salient object detection over both types of data (i.e., fixations and objects). Published by Elsevier Ltd.
引用
收藏
页码:62 / 77
页数:16
相关论文
共 87 条
  • [1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
  • [2] Alpert S., 2007, IEEE C COMP VIS PATT, P1
  • [3] [Anonymous], 2008, Orienting of Attention
  • [4] [Anonymous], ADV NEURAL INFORM PR
  • [5] [Anonymous], 2007, Computer Vision and Pattern Recognition (CVPR), IEEE Conference on
  • [6] MEMORY REPRESENTATIONS IN NATURAL TASKS
    BALLARD, DH
    HAYHOE, MM
    PELZ, JB
    [J]. JOURNAL OF COGNITIVE NEUROSCIENCE, 1995, 7 (01) : 66 - 80
  • [7] Mechanisms of top-down attention
    Baluchi, Farhan
    Itti, Laurent
    [J]. TRENDS IN NEUROSCIENCES, 2011, 34 (04) : 210 - 224
  • [8] Berg A., 2012, IEEE C COMP VIS PATT
  • [9] Borji A., 2013, IEEE T IMAGE PROCESS
  • [10] Computational Modeling of Top-down Visual Attention in Interactive Environments
    Borji, Ali
    Sihite, Dicky N.
    Itti, Laurent
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,