Learning to predict eye fixations for semantic contents using multi-layer sparse network

被引：36

作者：

Shen, Chengyao ^{[1
]}

Zhao, Qi ^{[2
]}

机构：

[1] Natl Univ Singapore, NUS Grad Sch Integrat Sci & Engn NGS, Singapore 117456, Singapore

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117576, Singapore

来源：

NEUROCOMPUTING | 2014年 / 138卷

关键词：

Semantic saliency; Gaze prediction; Sparse coding; Deep learning; OBJECT RECOGNITION; VISUAL-ATTENTION; SALIENCY; MECHANISM; SHIFTS;

D O I：

10.1016/j.neucom.2013.09.053

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present a novel model for saliency prediction under a unified framework of feature integration. The model distinguishes itself by directly learning from natural images and automatically incorporating higher-level semantic information in a scalable manner for gaze prediction. Unlike most existing saliency models that rely on specific features or object detectors, our model learns multiple stages of features that mimic the hierarchical organization of the ventral stream in the visual cortex and integrate them by adapting their weights based on the ground-truth fixation data. To accomplish this, we utilize a multi-layer sparse network to learn low-, mid- and high-level features from natural images and train a linear support vector machine (SVM) for weight adaption and feature integration. Experimental results show that our model could learn high-level semantic features like faces and texts and can perform competitively among existing approaches in predicting eye fixations. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：61 / 68

页数：8

共 31 条

[1]

[Anonymous], ADV NEURAL INF PROCE

[2]

[Anonymous], 2009, P 26 ANN INT C MACHI, DOI DOI 10.1145/1553374.1553453

[3]

Barlow H.B., 1961, SENS COMMUN, V1, DOI DOI 10.7551/MITPRESS/9780262518420.003.0013

[4] Learning Deep Architectures for AI [J].

Bengio, Yoshua .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127

[5]

Boureau Y.-L., 2010, P ICML 10 P 27 INT C, P111

[6] Faces and text attract gaze independent of the task: Experimental data and computer model [J].

Cerf, Moran ;

Frady, E. Paxon ;

Koch, Christof .

JOURNAL OF VISION, 2009, 9 (12)

[7]

Dan Y, 1996, J NEUROSCI, V16, P3351

[8]

Einhauser W, 2008, J VIS, V8, P21

[9] NEOCOGNITRON - A SELF-ORGANIZING NEURAL NETWORK MODEL FOR A MECHANISM OF PATTERN-RECOGNITION UNAFFECTED BY SHIFT IN POSITION [J].

FUKUSHIMA, K .

BIOLOGICAL CYBERNETICS, 1980, 36 (04) :193-202

[10]

Harel J., 2006, Graph-Based Visual Saliency, V19, DOI DOI 10.7551/MITPRESS/7503.003.0073

← 1 2 3 4 →