HCP: A Flexible CNN Framework for Multi-Label Image Classification

被引：617

作者：

Wei, Yunchao ^{[1
,2
,3
]}

Xia, Wei ^{[3
]}

Lin, Min ^{[3
]}

Huang, Junshi ^{[3
]}

Ni, Bingbing ^{[4
]}

Dong, Jian ^{[3
]}

Zhao, Yao ^{[1
,2
]}

Yan, Shuicheng ^{[3
]}

机构：

[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China

[2] Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China

[3] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117548, Singapore

[4] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200030, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2016年 / 38卷 / 09期

关键词：

Deep Learning; CNN; Multi-label Classification;

D O I：

10.1109/TPAMI.2015.2491929

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground-truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) the shared CNN is flexible and can be well pre-trained with a large-scale single-label image dataset, e.g., ImageNet; and 4) it may naturally output multi-label prediction results. Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 90.5% by HCP only and 93.2% after the fusion with our complementary result in [12] based on hand-crafted features on the VOC 2012 dataset.

引用

页码：1901 / 1907

页数：7

共 44 条

[11]

Chen Q, 2012, PROC CVPR IEEE, P3426, DOI 10.1109/CVPR.2012.6248083

[12] BING: Binarized Normed Gradients for Objectness Estimation at 300fps [J].

Cheng, Ming-Ming ;

Zhang, Ziming ;

Lin, Wen-Yan ;

Torr, Philip .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3286-3293

[13]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[14] Subcategory-aware Object Classification [J].

Dong, Jian ;

Xia, Wei ;

Chen, Qiang ;

Feng, Jianshi ;

Huang, Zhongyang ;

Yan, Shuicheng .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :827-834

[15] The Pascal Visual Object Classes (VOC) Challenge [J].

Everingham, Mark ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338

[16] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

[17]

Gong Y., 2013, CoRR

[18]

Griffin G., 2007, CALTECH 256 OBJECT C

[19] Combining efficient object localization and image classification [J].

Harzallah, Hedi ;

Jurie, Frederic ;

Schmid, Cordelia .

2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :237-244

[20]

He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]

← 1 2 3 4 5 →