End-to-End Detection-Segmentation System for Face Labeling

被引:49
作者
Wen, Shiping [1 ]
Dong, Minghui [2 ]
Yang, Yin [3 ]
Zhou, Pan [4 ]
Huang, Tingwen [5 ]
Chen, Yiran [6 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Automat & Artificial Intelligence, Wuhan 430074, Peoples R China
[3] Hamad Bin Khalifa Univ, Coll Sci Engn & Technol, Doha 5855, Qatar
[4] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China
[5] Texas A&M Univ Qatar, Dept Sci Program, Doha 23874, Qatar
[6] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2021年 / 5卷 / 03期
关键词
Face; Labeling; Semantics; Image segmentation; Feature extraction; Convolution; Kernel; Detection-segmentation; face labeling; multi-face; FULLY CONVOLUTIONAL NETWORK; CLASSIFICATION;
D O I
10.1109/TETCI.2019.2947319
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose an end-to-end detection-segmentation system to implement detailed face labeling. Fully convolutional networks (FCN) has become the mainstream algorithm in the field of semantic segmentation due to the state-of-the-art performance. However, a general FCN usually produces smooth and homogeneous results. Moreover, when semantic category is extremely unbalanced in samples such as face labeling problem, features for some categories cannot be well explored by FCN. To alleviate these problems, a face image is firstly encoded to multi-level feature maps by a pyramid FCN, then features of different facial components are extracted separately according to the bounding box provided by a one-stage detection head. Three class-specific sub-networks are employed to process the extracted features to obtain the respective segmentation results. The skin-hair region can be decoded directly from the back end of the pyramid FCN. Finally, the overall segmentation result is obtained by combining different branches. Moreover, the proposed method trained on a single-face labeled dataset, can be directly used to implement detailed multi-face labeling tasks without any network modification and additional module or data. The overall structure can be trained in an end-to-end manner while maintaining a small network size (12 MB). Experiments show that the proposed method can generate more accurate (single or multi) face labeling results comparing with previous works and gets the state-of-the-art results in HELEN face dataset.
引用
收藏
页码:457 / 467
页数:11
相关论文
共 39 条
[1]  
[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.90
[2]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[3]   PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup [J].
Chang, Huiwen ;
Lu, Jingwan ;
Yu, Fisher ;
Finkelstein, Adam .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :40-48
[4]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[5]   BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation [J].
Dai, Jifeng ;
He, Kaiming ;
Sun, Jian .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1635-1643
[6]   Emotion Classification Using Segmentation of Vowel-Like and Non-Vowel-Like Regions [J].
Deb, Suman ;
Dandapat, Samarendra .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (03) :360-373
[7]   Sparse fully convolutional network for face labeling [J].
Dong, Minghui ;
Wen, Shiping ;
Zeng, Zhigang ;
Yan, Zheng ;
Huang, Tingwen .
NEUROCOMPUTING, 2019, 331 :465-472
[8]   Automatic Facial Expression Recognition Using Features of Salient Facial Patches [J].
Happy, S. L. ;
Routray, Aurobinda .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2015, 6 (01) :1-12
[9]   Enhancing human-computer interaction in medical segmentation [J].
Harders, M ;
Székely, G .
PROCEEDINGS OF THE IEEE, 2003, 91 (09) :1430-1442
[10]  
He K., 2017, IEEE I CONF COMP VIS, P2961, DOI DOI 10.1109/ICCV.2017.322