HCP: A Flexible CNN Framework for Multi-Label Image Classification

被引：614

作者：

Wei, Yunchao ^{[1
,2
,3
]}

Xia, Wei ^{[3
]}

Lin, Min ^{[3
]}

Huang, Junshi ^{[3
]}

Ni, Bingbing ^{[4
]}

Dong, Jian ^{[3
]}

Zhao, Yao ^{[1
,2
]}

Yan, Shuicheng ^{[3
]}

机构：

[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing 100044, Peoples R China

[2] Beijing Key Lab Adv Informat Sci & Network Techno, Beijing 100044, Peoples R China

[3] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117548, Singapore

[4] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200030, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2016年 / 38卷 / 09期

关键词：

Deep Learning; CNN; Multi-label Classification;

D O I：

10.1109/TPAMI.2015.2491929

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground-truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) the shared CNN is flexible and can be well pre-trained with a large-scale single-label image dataset, e.g., ImageNet; and 4) it may naturally output multi-label prediction results. Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 90.5% by HCP only and 93.2% after the fusion with our complementary result in [12] based on hand-crafted features on the VOC 2012 dataset.

引用

页码：1901 / 1907

页数：7

共 44 条

[1] Measuring the Objectness of Image Windows [J].

Alexe, Bogdan ;

Deselaers, Thomas ;

Ferrari, Vittorio .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) :2189-2202

[2]

[Anonymous], 2014, ARXIV14031840

[3]

[Anonymous], 2013, Decaf: A deep convolutional activation feature for generic visual recognition

[4]

[Anonymous], 2013, Caffe: An Open Source Convolutional Architecture for Fast Feature Embedding

[5] Multiscale Combinatorial Grouping [J].

Arbelaez, Pablo ;

Pont-Tuset, Jordi ;

Barron, Jonathan T. ;

Marques, Ferran ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :328-335

[6] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[7] CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts [J].

Carreira, Joao ;

Sminchisescu, Cristian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (07) :1312-1328

[8] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[9] The devil is in the details: an evaluation of recent feature encoding methods [J].

Chatfield, Ken ;

Lempitsky, Victor ;

Vedaldi, Andrea ;

Zisserman, Andrew .

PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,

[10] Contextualizing Object Detection and Classification [J].

Chen, Qiang ;

Song, Zheng ;

Dong, Jian ;

Huang, Zhongyang ;

Hua, Yang ;

Yan, Shuicheng .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (01) :13-27

← 1 2 3 4 5 →