Semantic Alignment: Finding Semantically Consistent Ground-truth for Facial Landmark Detection

被引:57
作者
Liu, Zhiwei [1 ,2 ]
Zhu, Xiangyu [1 ]
Hu, Guosheng [4 ,5 ]
Guo, Haiyun [1 ]
Tang, Ming [1 ,3 ]
Lei, Zhen [1 ]
Robertson, Neil M. [4 ,5 ]
Wang, Jinqiao [1 ,3 ]
机构
[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Universal AI Inc, ObjectEye Inc, Visionfin Inc, Frisco, TX USA
[4] AnyVision, Holon, Israel
[5] Queens Univ Belfast, Belfast, Antrim, North Ireland
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00358
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, deep learning based facial landmark detection has achieved great success. Despite this, we notice that the semantic ambiguity greatly degrades the detection performance. Specifically, the semantic ambiguity means that some landmarks (e.g. those evenly distributed along the face contour) do not have clear and accurate definition, causing inconsistent annotations by annotators. Accordingly, these inconsistent annotations, which are usually provided by public databases, commonly work as the ground-truth to supervise network training, leading to the degraded accuracy. To our knowledge, little research has investigated this problem. In this paper, we propose a novel probabilistic model which introduces a latent variable, i.e. the 'real' ground-truth which is semantically consistent, to optimize. This framework couples two parts (1) training landmark detection CNN and (2) searching the 'real' ground-truth. These two parts are alternatively optimized: the searched `real' ground-truth supervises the CNN training; and the trained CNN assists the searching of 'real' ground-truth. In addition, to recover the unconfidently predicted landmarks due to occlusion and low quality, we propose a global heatmap correction unit (GHCU) to correct outliers by considering the global face shape as a constraint. Extensive experiments on both image-based (300W and AFLW) and video-based (300-VW) databases demonstrate that our method effectively improves the landmark detection accuracy and achieves the state of the art performance.
引用
收藏
页码:3462 / 3471
页数:10
相关论文
共 37 条
[1]  
[Anonymous], 2011, P IEEE INT C COMPUTE, DOI DOI 10.1109/ICCVW.2011.6130513
[2]  
[Anonymous], 2017, IEEE T PATTERN ANAL
[3]  
[Anonymous], P 2 INT C AUD VID BI
[4]  
[Anonymous], 2017, P INT C COMP VIS ICC
[5]  
[Anonymous], 2016, LECT NOTES COMPUT SC, DOI DOI 10.1007/978-3-319-46484-8_29
[6]  
[Anonymous], 2015, ICCVW
[7]  
[Anonymous], CVPR
[8]  
[Anonymous], JOINT MULTIVIEW FACE
[9]  
Belhumeur PN, 2011, PROC CVPR IEEE, P545, DOI 10.1109/CVPR.2011.5995602
[10]   Automatic feature localisation with constrained local models [J].
Cristinacce, David ;
Cootes, Tim .
PATTERN RECOGNITION, 2008, 41 (10) :3054-3067