Human Parsing with Contextualized Convolutional Neural Network

被引：143

作者：

Liang, Xiaodan ^{[1
,2
]}

Xu, Chunyan ^{[2
]}

Shen, Xiaohui ^{[3
]}

Yang, Jianchao ^{[5
]}

Liu, Si ^{[6
]}

Tang, Jinhui ^{[4
]}

Lin, Liang ^{[1
]}

Yan, Shuicheng ^{[2
]}

机构：

[1] Sun Yat Sen Univ, Guangzhou, Guangdong, Peoples R China

[2] Natl Univ Singapore, Singapore 117548, Singapore

[3] Adobe Res, San Jose, CA USA

[4] Nanjing Univ Sci & Technol, Nanjing, Jiangsu, Peoples R China

[5] Snapchat Res, Venice, CA USA

[6] Chinese Acad Sci, Inst Informat Engn, State Key Lab Informat Secur, Beijing 100864, Peoples R China

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2015年

关键词：

D O I：

10.1109/ICCV.2015.163

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network. Given an input human image, Co-CNN produces the pixel-wise categorization in an end-to-end way. First, the cross-layer context is captured by our basic local-to-global-to-local structure, which hierarchically combines the global semantic information and the local fine details across different convolutional layers. Second, the global image-level label prediction is used as an auxiliary objective in the intermediate layer of the Co-CNN, and its outputs are further used for guiding the feature learning in subsequent convolutional layers to leverage the global image-level context. Finally, to further utilize the local super-pixel contexts, the within-super-pixel smoothing and cross-super-pixel neighbourhood voting are formulated as natural sub-components of the Co-CNN to achieve the local label consistency in both training and testing process. Comprehensive evaluations on two public datasets well demonstrate the significant superiority of our Co-CNN over other state-of-the-arts for human parsing. In particular, the F-1 score on the large dataset [15] reaches 76.95% by Co-CNN, significantly higher than 62.81% and 64.38% by the state-of-theart algorithms, M-CNN [21] and ATR [15], respectively.

引用

页码：1386 / 1394

页数：9

共 33 条

[1]

[Anonymous], 2014, P IEEE C COMP VIS PA

[2]

[Anonymous], 2014, ARXIV14115752

[3]

[Anonymous], 2015, ICCV

[4]

[Anonymous], 2015, CVPR

[5]

[Anonymous], 2014, COMPUTER VISION PATT

[6]

[Anonymous], 2015, ICCV

[7]

[Anonymous], 2013, P 3 ACM C INT C MULT

[8]

[Anonymous], 2015, CVPR

[9]

[Anonymous], IEEE T IMAGE PROCESS

[10] CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts [J].

Carreira, Joao ;

Sminchisescu, Cristian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (07) :1312-1328

← 1 2 3 4 →