Top-Down Visual Saliency via Joint CRF and Dictionary Learning

被引：103

作者：

Yang, Jimei ^{[1
]}

Yang, Ming-Hsuan ^{[2
]}

机构：

[1] Adobe Res, San Jose, CA 95110 USA

[2] Univ Calif Merced, Sch Engn, Merced, CA USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2017年 / 39卷 / 03期

基金：

美国国家科学基金会;

关键词：

Visual saliency; top-down visual saliency; fixation prediction; dictionary learning and conditional random fields; FEATURES; ATTENTION;

D O I：

10.1109/TPAMI.2016.2547384

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Top-down visual saliency is an important module of visual attention. In this work, we propose a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a visual dictionary. The proposed model incorporates a layered structure from top to bottom: CRF, sparse coding and image patches. With sparse coding as an intermediate layer, CRF is learned in a feature-adaptive manner; meanwhile with CRF as the output layer, the dictionary is learned under structured supervision. For efficient and effective joint learning, we develop a max-margin approach via a stochastic gradient descent algorithm. Experimental results on the Graz-02 and PASCAL VOC datasets show that our model performs favorably against state-of-the-art top-down saliency methods for target object localization. In addition, the dictionary update significantly improves the performance of our model. We demonstrate the merits of the proposed top-down saliency model by applying it to prioritizing object proposals for detection and predicting human fixations.

引用

页码：576 / 588

页数：13

共 44 条

[1] Measuring the Objectness of Image Windows [J].

Alexe, Bogdan ;

Deselaers, Thomas ;

Ferrari, Vittorio .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) :2189-2202

[2]

[Anonymous], 2006, ADV NEURAL INF PROCE

[3]

Belkin M, 2002, ADV NEUR IN, V14, P585

[4]

Bertelli L., 2011, 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), P2153, DOI 10.1109/CVPR.2011.5995597

[5]

Bruce N., 2005, NIPS, P155, DOI DOI 10.5555/2976248.2976268

[6]

CARREIRA J, 2010, PROC CVPR IEEE, P3241, DOI DOI 10.1109/CVPR.2010.5540063

[7]

Carreira J, 2012, LECT NOTES COMPUT SC, V7578, P430, DOI 10.1007/978-3-642-33786-4_32

[8] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[9] Global Contrast based Salient Region Detection [J].

Cheng, Ming-Ming ;

Zhang, Guo-Xin ;

Mitra, Niloy J. ;

Huang, Xiaolei ;

Hu, Shi-Min .

2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, :409-416

[10] What and where A Bayesian inference theory of attention [J].

Chikkerur, Sharat ;

Serre, Thomas ;

Tan, Cheston ;

Poggio, Tomaso .

VISION RESEARCH, 2010, 50 (22) :2233-2247

← 1 2 3 4 5 →