Combining CNNs and Markov-like Models for Facial Landmark Detection with Spatial Consistency Estimates

被引:4
作者
Gdoura, Ahmed [1 ,2 ]
Deguenther, Markus [2 ]
Lorenz, Birgit [1 ,3 ]
Effland, Alexander [4 ]
机构
[1] Justus Liebig Univ Giessen, Dept Ophthalmol, D-35392 Giessen, Germany
[2] TH Mittelhessen, Dept Math Nat Sci & Data Proc, D-61169 Friedberg, Germany
[3] Univ Hosp Bonn, Dept Ophthalmol, D-53127 Bonn, Germany
[4] Univ Bonn, Inst Appl Math, D-53115 Bonn, Germany
关键词
facial landmark detection; convolutional neural networks; Markov random field; NETWORK;
D O I
10.3390/jimaging9050104
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
The accurate localization of facial landmarks is essential for several tasks, including face recognition, head pose estimation, facial region extraction, and emotion detection. Although the number of required landmarks is task-specific, models are typically trained on all available landmarks in the datasets, limiting efficiency. Furthermore, model performance is strongly influenced by scale-dependent local appearance information around landmarks and the global shape information generated by them. To account for this, we propose a lightweight hybrid model for facial landmark detection designed specifically for pupil region extraction. Our design combines a convolutional neural network (CNN) with a Markov random field (MRF)-like process trained on only 17 carefully selected landmarks. The advantage of our model is the ability to run different image scales on the same convolutional layers, resulting in a significant reduction in model size. In addition, we employ an approximation of the MRF that is run on a subset of landmarks to validate the spatial consistency of the generated shape. This validation process is performed against a learned conditional distribution, expressing the location of one landmark relative to its neighbor. Experimental results on popular facial landmark localization datasets such as 300 w, WFLW, and HELEN demonstrate the accuracy of our proposed model. Furthermore, our model achieves state-of-the-art performance on a well-defined robustness metric. In conclusion, the results demonstrate the ability of our lightweight model to filter out spatially inconsistent predictions, even with significantly fewer training landmarks.
引用
收藏
页数:17
相关论文
共 56 条
[1]   2D Human Pose Estimation: New Benchmark and State of the Art Analysis [J].
Andriluka, Mykhaylo ;
Pishchulin, Leonid ;
Gehler, Peter ;
Schiele, Bernt .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3686-3693
[2]   Human Pose Estimation via Convolutional Part Heatmap Regression [J].
Bulat, Adrian ;
Tzimiropoulos, Georgios .
COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 :717-732
[3]  
Chen L., 2019, ADV NEURAL INF PROCE, V32, P2450
[4]   Building Hierarchies of Probabilistic Decision Tables [J].
Chen, Xuguang .
PROCEEDINGS OF THE SOUTHEAST CONFERENCE ACM SE'17, 2017, :142-143
[5]   ACTIVE SHAPE MODELS - THEIR TRAINING AND APPLICATION [J].
COOTES, TF ;
TAYLOR, CJ ;
COOPER, DH ;
GRAHAM, J .
COMPUTER VISION AND IMAGE UNDERSTANDING, 1995, 61 (01) :38-59
[6]   Active appearance models [J].
Cootes, TF ;
Edwards, GJ ;
Taylor, CJ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (06) :681-685
[7]   M3 CSR: Multi-view, multi-scale and multi-component cascade shape regression [J].
Deng, Jiankang ;
Liu, Qingshan ;
Yang, Jing ;
Tao, Dacheng .
IMAGE AND VISION COMPUTING, 2016, 47 :19-26
[8]  
Erhan D, 2010, J MACH LEARN RES, V11, P625
[9]   Efficient belief propagation for early vision [J].
Felzenszwalb, Pedro F. ;
Huttenlocher, Daniel P. .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2006, 70 (01) :41-54
[10]   Regression-based methods for face alignment: A survey [J].
Gogic, Ivan ;
Ahlberg, Jorgen ;
Pandzic, Igor S. .
SIGNAL PROCESSING, 2021, 178