LIGHTWEIGHT FACIAL LANDMARK DETECTION WITH WEAKLY SUPERVISED LEARNING

被引：0

作者：

Lai, Shenqi ^{[1
]}

Liu, Lei ^{[1
]}

Chai, Zhenhua ^{[1
]}

Wei, Xiaolin ^{[1
]}

机构：

[1] Meituan, Beijing, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW) | 2021年

关键词：

Facial Landmark Detection; Single Layer Coordinate Attention; Dual Soft Argmax; Coarse Localization Regulation; Weakly Supervised Learning;

D O I：

10.1109/ICMEW53276.2021.9455973

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

A robust facial landmark detection framework is proposed in this paper, which can be trained in an end-to-end fashion and has achieved promising detection accuracy in the 3rd Grand Challenge of 106-Point Facial Landmark Localization. Firstly, the upper bound of computational complexity is 100MFLOPs and the model size is 2MB, we design a new model named ShuffleNeXt to be the backbone. Based on ShuffleNetV2, groupwise convolution layer is used to replace the standard depthwise convolution layer. Swish function is also used to replace ReLU function. What is more, we design a single layer coordinate attention module to capture spatial and channel information, which is better than the coordinate attention and squeeze-and-excitation module. In order to prevent the accuracy loss by the coordinates quantization, dual soft argmax is used for mapping the heatmap response to coordinates. Besides, a coarse localization regulation is also proposed to improve the performance. In the end, we introduce weakly supervised learning to increase training samples. We train the model and re-label the large scale CelebA dataset. Original CelebA only has 5 points annotations, so we calculate the NME on these 5 points. We set the threshold of NME to 1% and find 162,731 face images fit the bill. So the number of training set is expanded from 20,384 to 183,115, which is 800% larger than original dataset. The best result 79.38% for AUC is achieved on the validation set.

引用

页数：6

共 21 条

[1] Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources [J].

Bulat, Adrian ;

Tzimiropoulos, Georgios .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3726-3734

[2]

Devries Terrance, 2017, CORR

[3] Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks [J].

Feng, Zhen-Hua ;

Kittler, Josef ;

Awais, Muhammad ;

Huber, Patrik ;

Wu, Xiao-Jun .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2235-2245

[4] Identity Mappings in Deep Residual Networks [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :630-645

[5] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[6] Coordinate Attention for Efficient Mobile Network Design [J].

Hou, Qibin ;

Zhou, Daquan ;

Feng, Jiashi .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13708-13717

[7] Searching for MobileNetV3 [J].

Howard, Andrew ;

Sandler, Mark ;

Chu, Grace ;

Chen, Liang-Chieh ;

Chen, Bo ;

Tan, Mingxing ;

Wang, Weijun ;

Zhu, Yukun ;

Pang, Ruoming ;

Vasudevan, Vijay ;

Le, Quoc V. ;

Adam, Hartwig .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1314-1324

[8]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]

[9] IMPROVED HOURGLASS STRUCTURE FOR HIGH PERFORMANCE FACIAL LANDMARK DETECTION [J].

Lai, Shenqi ;

Chai, Zhenhua ;

Wei, Xiaoming .

2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, :669-672

[10]

Lai Shenqi, 2019, P BMVC, P111

← 1 2 3 →