Towards unsupervised learning of joint facial landmark detection and head pose estimation

被引：0

作者：

Zou, Zhiming ^{[1
]}

Jia, Dian ^{[1
]}

Tang, Wei ^{[1
]}

机构：

[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA

来源：

PATTERN RECOGNITION | 2025年 / 162卷

基金：

美国国家科学基金会;

关键词：

Facial landmark detection; Head pose estimation; Unsupervised learning;

D O I：

10.1016/j.patcog.2025.111393

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning approaches have advanced state-of-the-art performance drastically in facial landmark detection and head pose estimation. Recent work shows that meaningful landmarks could be discovered from unlabeled image collections. However, they only mine local visual patterns in images as 2D landmarks while ignoring the 3D object structure. Consequently, they can neither directly estimate the object pose from an image nor use it for improved landmark discovery. Therefore, we propose a novel framework that jointly learns both tasks. It includes a multi-task network for joint landmark and pose prediction, a set of learnable 3D canonical landmarks, and an image generation network. They are learned collaboratively on unlabeled face images through an integrated loss of conditional image generation and geometric consistency. We also investigate different strategies to handle potential face deformation. Extensive experiments show that our approach is very effective in both tasks, even comparable to some supervised methods. The code is available at https://github.com/ZhimingZo/unsup-face-analysis

引用

页数：13

共 39 条

[1] Sun Y., Zeng J., Shan S., Gaze estimation with semi-supervised eye landmark detection as an auxiliary task, Pattern Recognit., 146, (2024)
[2] Chen B., Guan W., Li P., Ikeda N., Hirasawa K., Lu H., Residual multi-task learning for facial landmark localization and expression recognition, Pattern Recognit., 115, (2021)
[3] Yang J., Wang Z., Huang B., Xiao J., Liang C., Han Z., Zou H., HeadPose-softmax: Head pose adaptive curriculum learning loss for deep face recognition, Pattern Recognit., 140, (2023)
[4] He M., Zhang J., Shan S., Kan M., Chen X., Deformable face net for pose invariant face recognition, Pattern Recognit., 100, (2020)
[5] Liao C.-T., Chuang H.-J., Duan C.-H., Lai S.-H., Learning spatial weighting for facial expression analysis via constrained quadratic programming, Pattern Recognit., 46, 11, pp. 3103-3116, (2013)
[6] Jakab T., Gupta A., Bilen H., Vedaldi A., Unsupervised learning of object landmarks through conditional image generation, Adv. Neural Inf. Process. Syst., 31, (2018)
[7] Zhang Y., Guo Y., Jin Y., Luo Y., He Z., Lee H., Unsupervised discovery of object landmarks as structural representations, pp. 2694-2703, (2018)
[8] Thewlis J., Bilen H., Vedaldi A., Unsupervised learning of object landmarks by factorized spatial embeddings, pp. 5916-5925, (2017)
[9] Bespalov I., Buzun N., Dylov D.V., BRULÉ: Barycenter-regularized unsupervised landmark extraction, Pattern Recognit., 131, (2022)
[10] Zhen X., Yu M., Xiao Z., Zhang L., Shao L., Heterogenous output regression network for direct face alignment, Pattern Recognit., 105, (2020)

← 1 2 3 4 →