Weakly-supervised scene parsing with multiple contextual cues

被引：3

作者：

Li, Teng ^{[1
]}

Wu, Xinyu ^{[2
]}

Ni, Bingbing ^{[3
]}

Lu, Ke ^{[4
]}

Yan, Shuicheng ^{[5
]}

机构：

[1] Anhui Univ, Coll Elect Engn & Automat, Hefei, Peoples R China

[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Beijing 100864, Peoples R China

[3] Adv Digital Sci Ctr, Singapore 138632, Singapore

[4] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[5] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117548, Singapore

来源：

INFORMATION SCIENCES | 2015年 / 323卷

关键词：

Scene parsing; Weakly-supervised; Multiple context; IMAGE; CLASSIFICATION; KERNELS;

D O I：

10.1016/j.ins.2015.06.024

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Scene parsing, fully labeling an image with each region corresponding to a label, is one of the core problems of computer vision. Previous methods to this problem usually rely on patch-level models trained from well labeled data. In this paper, we propose a weakly-supervised scene parsing algorithm that semantically parses a collection of images with multi-label, which is guided by the top-down category models and bottom-up local patch contexts across images that closely related segments usually have similar labels. Images are segmented to patches on multi-level and the contextual relations of patches are discovered via sparse representation by l(1) minimization, based on which a graph is constructed. The multi-level spatial context of patches is also embedded in the graph, based on which image-level labels can be propagated to segments optimally. The contextual patch labeling process is formulated in an optimization framework and solved by a convergent iterative method. The category models are learned from the decomposed label representations of the image set and applied to the segments. Final labeling is obtained by combining all the information on pixel level. The effectiveness of the proposed method is demonstrated in experiments on two benchmark datasets and comparisons are taken. (C) 2015 Elsevier Inc. All rights reserved.

引用

页码：59 / 72

页数：14

共 39 条

[1] [Anonymous], P INT C COMP VIS ICC
[2] [Anonymous], 2009, PROC 17 ACM INT C MU
[3] Matching words and pictures
Barnard, K
Duygulu, P
Forsyth, D
de Freitas, N
Blei, DM
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1107 - 1135
[4] Cao L., 2007, P INT C COMP VIS ICC
[5] LIBSVM: A Library for Support Vector Machines
Chang, Chih-Chung
Lin, Chih-Jen
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[6] Chen Y., 2008, P IEEE INT C COMP VI
[7] BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Cheng, Ming-Ming
Zhang, Ziming
Lin, Wen-Yan
Torr, Philip
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3286 - 3293
[8] Cliff R.S., 2011, PROCEEDINGS OF THE I
[9] Image retrieval: Ideas, influences, and trends of the new age
Datta, Ritendra
Joshi, Dhiraj
Li, Jia
Wang, James Z.
[J]. ACM COMPUTING SURVEYS, 2008, 40 (02)
[10] Eigen D, 2012, PROC CVPR IEEE, P2799, DOI 10.1109/CVPR.2012.6248004

← 1 2 3 4 →