Predicting human gaze beyond pixels

被引:266
作者
Xu, Juan [1 ]
Jiang, Ming [1 ]
Wang, Shuo [2 ]
Kankanhalli, Mohan S. [3 ]
Zhao, Qi [1 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117548, Singapore
[2] CALTECH, Pasadena, CA 91125 USA
[3] Natl Univ Singapore, Sch Comp, Dept Comp Sci, Singapore 117548, Singapore
关键词
visual saliency; saliency attribute; object saliency; semantic saliency; dataset; computational model; FUSIFORM FACE AREA; VISUAL-ATTENTION; EYE-MOVEMENTS; FIXATION SELECTION; LUMINANCE CONTRAST; CORTICAL REGION; SALIENCY; MODEL; OBJECTS; SYSTEM;
D O I
10.1167/14.1.28
中图分类号
R77 [眼科学];
学科分类号
100212 ;
摘要
A large body of previous models to predict where people look in natural scenes focused on pixel-level image attributes. To bridge the semantic gap between the predictive power of computational saliency models and human behavior, we propose a new saliency architecture that incorporates information at three layers: pixel-level image attributes, object-level attributes, and semantic-level attributes. Object- and semantic-level information is frequently ignored, or only a few sample object categories are discussed where scaling to a large number of object categories is not feasible nor neurally plausible. To address this problem, this work constructs a principled vocabulary of basic attributes to describe object- and semantic-level information thus not restricting to a limited number of object categories. We build a new dataset of 700 images with eye-tracking data of 15 viewers and annotation data of 5,551 segmented objects with fine contours and 12 semantic attributes (publicly available with the paper). Experimental results demonstrate the importance of the object- and semantic-level information in the prediction of visual attention.
引用
收藏
页数:20
相关论文
共 91 条
[1]   Effects of luminance contrast and its modifications on fixation behavior during free viewing of images from different categories [J].
Acik, Alper ;
Onat, Selim ;
Schumann, Frank ;
Einhaeuser, Wolfgang ;
Koenig, Peter .
VISION RESEARCH, 2009, 49 (12) :1541-1553
[2]   What does the amygdala contribute to social cognition? [J].
Adolphs, Ralph .
YEAR IN COGNITIVE NEUROSCIENCE 2010, 2010, 1191 :42-61
[3]  
[Anonymous], 2009, Vision Res., DOI [DOI 10.1016/J.VISRES.2008.09.007, 10.1016/j.visres.2008.09.007]
[4]  
[Anonymous], P BRAIN INSP COGN SY
[5]  
Argyle M., 1973, Semiotica, V7, P19, DOI [DOI 10.1515/SEMI.1973.7.1.19, 10.1515/semi.1973.7.1.19]
[6]   High frequency edges (but not contrast) predict where we fixate: A Bayesian system identification analysis [J].
Baddeley, Roland J. ;
Tatler, Benjamin W. .
VISION RESEARCH, 2006, 46 (18) :2824-2833
[7]   fMRI responses to video and point-light displays of moving humans and manipulable objects [J].
Beauchamp, MS ;
Lee, KE ;
Haxby, JV ;
Martin, A .
JOURNAL OF COGNITIVE NEUROSCIENCE, 2003, 15 (07) :991-1001
[8]   Faces retain attention [J].
Bindemann, M ;
Burton, AM ;
Hooge, ITC ;
Jenkins, R ;
De Haan, EHF .
PSYCHONOMIC BULLETIN & REVIEW, 2005, 12 (06) :1048-1053
[9]   The control of attention to faces [J].
Bindemann, Markus ;
Burton, A. Mike ;
Langton, Stephen R. H. ;
Schweinberger, Stefan R. ;
Doherty, Martin J. .
JOURNAL OF VISION, 2007, 7 (10)
[10]   MULTIPLE SIGNIFICANCE TESTS - THE BONFERRONI METHOD .10. [J].
BLAND, JM ;
ALTMAN, DG .
BRITISH MEDICAL JOURNAL, 1995, 310 (6973) :170-170