HCP: A Flexible CNN Framework for Multi-Label Image Classification

被引:594
作者
Wei, Yunchao [1 ,2 ,3 ]
Xia, Wei [3 ]
Lin, Min [3 ]
Huang, Junshi [3 ]
Ni, Bingbing [4 ]
Dong, Jian [3 ]
Zhao, Yao [1 ,2 ]
Yan, Shuicheng [3 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China
[2] Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China
[3] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117548, Singapore
[4] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200030, Peoples R China
关键词
Deep Learning; CNN; Multi-label Classification;
D O I
10.1109/TPAMI.2015.2491929
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground-truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) the shared CNN is flexible and can be well pre-trained with a large-scale single-label image dataset, e.g., ImageNet; and 4) it may naturally output multi-label prediction results. Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 90.5% by HCP only and 93.2% after the fusion with our complementary result in [12] based on hand-crafted features on the VOC 2012 dataset.
引用
收藏
页码:1901 / 1907
页数:7
相关论文
共 44 条
  • [11] Chen Q, 2012, PROC CVPR IEEE, P3426, DOI 10.1109/CVPR.2012.6248083
  • [12] BING: Binarized Normed Gradients for Objectness Estimation at 300fps
    Cheng, Ming-Ming
    Zhang, Ziming
    Lin, Wen-Yan
    Torr, Philip
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3286 - 3293
  • [13] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [14] Subcategory-aware Object Classification
    Dong, Jian
    Xia, Wei
    Chen, Qiang
    Feng, Jianshi
    Huang, Zhongyang
    Yan, Shuicheng
    [J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 827 - 834
  • [15] The Pascal Visual Object Classes (VOC) Challenge
    Everingham, Mark
    Van Gool, Luc
    Williams, Christopher K. I.
    Winn, John
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) : 303 - 338
  • [16] Rich feature hierarchies for accurate object detection and semantic segmentation
    Girshick, Ross
    Donahue, Jeff
    Darrell, Trevor
    Malik, Jitendra
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587
  • [17] Gong Y., 2013, CoRR
  • [18] Griffin G., 2007, CALTECH 256 OBJECT C
  • [19] Combining efficient object localization and image classification
    Harzallah, Hedi
    Jurie, Frederic
    Schmid, Cordelia
    [J]. 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 237 - 244
  • [20] He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]