Two-Stage Learning to Predict Human Eye Fixations via SDAEs

被引：114

作者：

Han, Junwei ^{[1
]}

Zhang, Dingwen ^{[1
]}

Wen, Shifeng ^{[1
]}

Guo, Lei ^{[1
]}

Liu, Tianming ^{[2
]}

Li, Xuelong ^{[3
]}

机构：

[1] Northwestern Polytech Univ, Sch Automat, Xian 710072, Peoples R China

[2] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA

[3] Chinese Acad Sci, Xian Inst Opt & Precis Mech, State Key Lab Transient Opt & Photon, Ctr OPT IMagery Anal & Learning, Xian 710119, Peoples R China

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2016年 / 46卷 / 02期

基金：

国家教育部博士点专项基金资助; 美国国家科学基金会;

关键词：

Deep networks; eye fixation prediction; saliency detection; stacked denoising autoencoders ( SDAEs); VISUAL SALIENCY; OBJECT DETECTION; RETRIEVAL; ATTENTION; AUTOENCODERS; FRAMEWORK; MODEL;

D O I：

10.1109/TCYB.2015.2404432

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Saliency detection models aiming to quantitatively predict human eye-attended locations in the visual field have been receiving increasing research interest in recent years. Unlike traditional methods that rely on hand-designed features and contrast inference mechanisms, this paper proposes a novel framework to learn saliency detection models from raw image data using deep networks. The proposed framework mainly consists of two learning stages. At the first learning stage, we develop a stacked denoising autoencoder (SDAE) model to learn robust, representative features from raw image data under an unsupervised manner. The second learning stage aims to jointly learn optimal mechanisms to capture the intrinsic mutual patterns as the feature contrast and to integrate them for final saliency prediction. Given the input of pairs of a center patch and its surrounding patches represented by the features learned at the first stage, a SDAE network is trained under the supervision of eye fixation labels, which achieves both contrast inference and contrast integration simultaneously. Experiments on three publically available eye tracking benchmarks and the comparisons with 16 state-of-the-art approaches demonstrate the effectiveness of the proposed framework.

引用

页码：487 / 498

页数：12

共 60 条

[1]

[Anonymous], 2011, AISTATS

[2]

Bengio Yoshua, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P1, DOI 10.1007/978-3-642-39593-2_1

[3]

Bengio Y., 2012, P ICML WORKSH UNS TR, V27, P17, DOI DOI 10.1109/IJCNN.2011.6033302

[4] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[5] Learning Deep Architectures for AI [J].

Bengio, Yoshua .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127

[6]

Borji A., 2012, CVPR, DOI DOI 10.1109/CVPR.2012.6247706

[7] Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study [J].

Borji, Ali ;

Sihite, Dicky N. ;

Itti, Laurent .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (01) :55-69

[8]

Borji A, 2012, PROC CVPR IEEE, P478, DOI 10.1109/CVPR.2012.6247711

[9] Saliency, attention, and visual search: An information theoretic approach [J].

Bruce, Neil D. B. ;

Tsotsos, John K. .

JOURNAL OF VISION, 2009, 9 (03)

[10]

Cerf M., 2008, Advances in Neural Information Processing Systems, V20, P241

← 1 2 3 4 5 6 →