Weakly-supervised scene parsing with multiple contextual cues

被引:3
作者
Li, Teng [1 ]
Wu, Xinyu [2 ]
Ni, Bingbing [3 ]
Lu, Ke [4 ]
Yan, Shuicheng [5 ]
机构
[1] Anhui Univ, Coll Elect Engn & Automat, Hefei, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Beijing 100864, Peoples R China
[3] Adv Digital Sci Ctr, Singapore 138632, Singapore
[4] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[5] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117548, Singapore
关键词
Scene parsing; Weakly-supervised; Multiple context; IMAGE; CLASSIFICATION; KERNELS;
D O I
10.1016/j.ins.2015.06.024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scene parsing, fully labeling an image with each region corresponding to a label, is one of the core problems of computer vision. Previous methods to this problem usually rely on patch-level models trained from well labeled data. In this paper, we propose a weakly-supervised scene parsing algorithm that semantically parses a collection of images with multi-label, which is guided by the top-down category models and bottom-up local patch contexts across images that closely related segments usually have similar labels. Images are segmented to patches on multi-level and the contextual relations of patches are discovered via sparse representation by l(1) minimization, based on which a graph is constructed. The multi-level spatial context of patches is also embedded in the graph, based on which image-level labels can be propagated to segments optimally. The contextual patch labeling process is formulated in an optimization framework and solved by a convergent iterative method. The category models are learned from the decomposed label representations of the image set and applied to the segments. Final labeling is obtained by combining all the information on pixel level. The effectiveness of the proposed method is demonstrated in experiments on two benchmark datasets and comparisons are taken. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:59 / 72
页数:14
相关论文
共 39 条
  • [1] [Anonymous], P INT C COMP VIS ICC
  • [2] [Anonymous], 2009, PROC 17 ACM INT C MU
  • [3] Matching words and pictures
    Barnard, K
    Duygulu, P
    Forsyth, D
    de Freitas, N
    Blei, DM
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1107 - 1135
  • [4] Cao L., 2007, P INT C COMP VIS ICC
  • [5] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [6] Chen Y., 2008, P IEEE INT C COMP VI
  • [7] BING: Binarized Normed Gradients for Objectness Estimation at 300fps
    Cheng, Ming-Ming
    Zhang, Ziming
    Lin, Wen-Yan
    Torr, Philip
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3286 - 3293
  • [8] Cliff R.S., 2011, PROCEEDINGS OF THE I
  • [9] Image retrieval: Ideas, influences, and trends of the new age
    Datta, Ritendra
    Joshi, Dhiraj
    Li, Jia
    Wang, James Z.
    [J]. ACM COMPUTING SURVEYS, 2008, 40 (02)
  • [10] Eigen D, 2012, PROC CVPR IEEE, P2799, DOI 10.1109/CVPR.2012.6248004