Kinematic skeleton graph augmented network for human parsing

被引:7
作者
Liu, Jinde [1 ,2 ]
Zhang, Zhang [1 ,2 ]
Shan, Caifeng [3 ,4 ]
Tan, Tieniu [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Beijing, Peoples R China
[2] Chinese Acad Sci, Ctr Res Intelligent Percept & Comp, Inst Automat, Beijing, Peoples R China
[3] CAS CAS AIR, Artificial Intelligence Res, Beijing, Peoples R China
[4] Shandong Univ Sci & Technol, Coll Elect Engn & Automat, Qingdao, Peoples R China
关键词
Image segmentation; Human parsing; Deeplab V3+; Kinematic skeleton graph; Human parsing dataset; SEGMENTATION; MULTITASK;
D O I
10.1016/j.neucom.2020.07.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human parsing, which is a task of labeling pixels in human images into different fine-grained semantic parts, has achieved significant progress during the past decade. However, there are still several challenges in human parsing, due to occlusions, varying poses and similar appearance between the left/right parts. To tackle these problems, a Human Kinematic Skeleton Graph Layer (HKSGL) is proposed to augment regular neural networks with human kinematic skeleton information. The HKSGL has two major components: kinematic skeleton graph and interconnected modular neural layer. The kinematic skeleton graph is a user pre-defined skeleton graph, which models the interconnections between different semantic parts. Then the skeleton graph is passed to the interconnected modular neural layer which is composed of a set of modular blocks corresponding to different semantic parts. The HKSGL is a lightweight, low costs layer which can be easily attached to any existing neural networks. To demonstrate the power of the HKSGL, a new dataset on human parsing in occlusions is also collected, termed the RAP-Occ. Extensive experiments have been performed on four datasets on human parsing, including the LIP, the CIHP, the ATR and the RAP-Occ. And two popular baselines, i.e., the Deeplab V3+ and the CE2P, are agumented by the proposed HKSGL. Competitive performance of the augmented models has been achieved in comparison with state-of-the-art methods. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:457 / 470
页数:14
相关论文
共 58 条
[1]   Neural Module Networks [J].
Andreas, Jacob ;
Rohrbach, Marcus ;
Darrell, Trevor ;
Klein, Dan .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :39-48
[2]  
[Anonymous], Neural Inf. Proc. Systems (NIPS)
[3]  
[Anonymous], 2015, PMLR
[4]  
[Anonymous], 2016, ICLR
[5]  
[Anonymous], Nature Methods
[6]  
Azam Farooq, 2000, THESIS
[7]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[8]  
Bai JY, 2018, IEEE INT CONF COMMUN, P34, DOI 10.1109/ICCChina.2018.8641202
[9]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[10]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848