Predicting Eye Fixations With Higher-Level Visual Features

被引：20

作者：

Liang, Ming ^{[1
,2
]}

Hu, Xiaolin ^{[1
,2
,3
]}

机构：

[1] Tsinghua Univ, Tsinghua Natl Lab Informat Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

[3] Tsinghua Univ, Ctr Brain Inspired Comp Res, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2015年 / 24卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Attention; saliency; feature; visual hierarchy; OBJECT RECOGNITION; MODEL; SEARCH; COLOR; MECHANISMS; ATTENTION; PROVIDE; FIGURE; CORTEX;

D O I：

10.1109/TIP.2015.2395713

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Saliency map and object map are the two contrasting hypotheses for the mechanisms utilized by the visual system to guide eye fixations when humans are freely viewing natural images. Most computational studies define saliency as outliers of distributions of low-level features, and propose saliency as an important factor for predicting eye fixations. Psychophysical studies, however, suggest that high-level objects predict eye fixations more accurately and the early saliency only has a minor effect. But this view has been challenged by a study which shows opposite results, suggesting that the role of object-level features needs further investigations. In addition, little is known about the role of intermediate features between the low-level and the object-level features. In this paper, we construct two models based on mid-level and object-level features, respectively, and compare their performances against those based on low-level features. Quantitative evaluation on three benchmark natural image fixation data sets demonstrates that the mid-level model outperforms the state-of-the-art low-level models by a significant margin and the object-level model is inferior to most low-level models. Quantitative evaluation on a video fixation data set demonstrates that both the mid-level and object-level models outperform the state-of-the-art low-level models, and the latter performs better under three out of four standard metrics. When combined together the two proposed models achieve even higher performance. However, incorporating the best low-level model yields negligible improvements on all of the data sets. Taken together, these results indicate that higher level features may be more effective than low-level features for predicting eye fixations on natural images in the free viewing condition.

引用

页码：1178 / 1189

页数：12

共 62 条

[11] Berlin B, 1991, BASIC COLOR TERMS TH
[12] Parallel and serial neural mechanisms for visual search in macaque area V4
Bichot, NP
Rossi, AF
Desimone, R
[J]. SCIENCE, 2005, 308 (5721) : 529 - 534
[13] Objects do not predict fixations better than early saliency: A re-analysis of Einhauser et al.'s data
Borji, Ali
Sihite, Dicky N.
Itti, Laurent
[J]. JOURNAL OF VISION, 2013, 13 (10):
[14] Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study
Borji, Ali
Sihite, Dicky N.
Itti, Laurent
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (01) : 55 - 69
[15] Borji A, 2012, PROC CVPR IEEE, P478, DOI 10.1109/CVPR.2012.6247711
[16] Borji A, 2012, PROC CVPR IEEE, P438, DOI 10.1109/CVPR.2012.6247706
[17] Specialized color modules in macaque extrastriate cortex
Conway, Bevil R.
Moeller, Sebastian
Tsao, Doris Y.
[J]. NEURON, 2007, 56 (03) : 560 - 573
[18] Histograms of oriented gradients for human detection
Dalal, N
Triggs, B
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
[19] Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli
Einhaeuser, Wolfgang
Rutishauser, Ueli
Koch, Christof
[J]. JOURNAL OF VISION, 2008, 8 (02):
[20] Visual saliency estimation by nonlinearly integrating features using region covariances
Erdem, Erkut
Erdem, Aykut
[J]. JOURNAL OF VISION, 2013, 13 (04):

← 1 2 3 4 5 6 7 →