Human Parsing with Contextualized Convolutional Neural Network

被引:143
作者
Liang, Xiaodan [1 ,2 ]
Xu, Chunyan [2 ]
Shen, Xiaohui [3 ]
Yang, Jianchao [5 ]
Liu, Si [6 ]
Tang, Jinhui [4 ]
Lin, Liang [1 ]
Yan, Shuicheng [2 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Guangdong, Peoples R China
[2] Natl Univ Singapore, Singapore 117548, Singapore
[3] Adobe Res, San Jose, CA USA
[4] Nanjing Univ Sci & Technol, Nanjing, Jiangsu, Peoples R China
[5] Snapchat Res, Venice, CA USA
[6] Chinese Acad Sci, Inst Informat Engn, State Key Lab Informat Secur, Beijing 100864, Peoples R China
来源
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2015年
关键词
D O I
10.1109/ICCV.2015.163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network. Given an input human image, Co-CNN produces the pixel-wise categorization in an end-to-end way. First, the cross-layer context is captured by our basic local-to-global-to-local structure, which hierarchically combines the global semantic information and the local fine details across different convolutional layers. Second, the global image-level label prediction is used as an auxiliary objective in the intermediate layer of the Co-CNN, and its outputs are further used for guiding the feature learning in subsequent convolutional layers to leverage the global image-level context. Finally, to further utilize the local super-pixel contexts, the within-super-pixel smoothing and cross-super-pixel neighbourhood voting are formulated as natural sub-components of the Co-CNN to achieve the local label consistency in both training and testing process. Comprehensive evaluations on two public datasets well demonstrate the significant superiority of our Co-CNN over other state-of-the-arts for human parsing. In particular, the F-1 score on the large dataset [15] reaches 76.95% by Co-CNN, significantly higher than 62.81% and 64.38% by the state-of-theart algorithms, M-CNN [21] and ATR [15], respectively.
引用
收藏
页码:1386 / 1394
页数:9
相关论文
共 33 条
[1]  
[Anonymous], 2014, P IEEE C COMP VIS PA
[2]  
[Anonymous], 2014, ARXIV14115752
[3]  
[Anonymous], 2015, ICCV
[4]  
[Anonymous], 2015, CVPR
[5]  
[Anonymous], 2014, COMPUTER VISION PATT
[6]  
[Anonymous], 2015, ICCV
[7]  
[Anonymous], 2013, P 3 ACM C INT C MULT
[8]  
[Anonymous], 2015, CVPR
[9]  
[Anonymous], IEEE T IMAGE PROCESS
[10]   CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts [J].
Carreira, Joao ;
Sminchisescu, Cristian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (07) :1312-1328