Facial Action Unit Detection and Intensity Estimation From Self-Supervised Representation

被引:0
作者
Ma, Bowen [1 ]
An, Rudong [1 ]
Zhang, Wei [1 ]
Ding, Yu [1 ]
Zhao, Zeng [1 ]
Zhang, Rongsheng [1 ]
Lv, Tangjie [1 ]
Fan, Changjie [1 ]
Hu, Zhipeng [1 ]
机构
[1] Netease Fuxi AI Lab, Hangzhou 310052, Peoples R China
关键词
Gold; Estimation; Image reconstruction; Annotations; Face recognition; Task analysis; Visualization; Facial action unit; facial expression recognition; facial representation model; self-supervised pre-training; RECOGNITION;
D O I
10.1109/TAFFC.2024.3367015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a fine-grained and local expression behavior measurement, facial action unit (FAU) analysis (e.g., detection and intensity estimation) has been documented for its time-consuming, labor-intensive, and error-prone annotation. Thus a long-standing challenge of FAU analysis arises from the data scarcity of manual annotations, limiting the generalization ability of trained models to a large extent. Amounts of previous works have made efforts to alleviate this issue via semi/weakly supervised methods and extra auxiliary information. However, these methods still require domain knowledge and have not yet avoided the high dependency on data annotation. This article introduces a robust facial representation model MAE-Face for AU analysis. Using masked autoencoding as the self-supervised pre-training approach, MAE-Face first learns a high-capacity model from a feasible collection of face images without additional data annotations. Then after being fine-tuned on AU datasets, MAE-Face exhibits convincing performance for both AU detection and AU intensity estimation, achieving a new state-of-the-art on nearly all the evaluation results. Further investigation shows that MAE-Face achieves decent performance even when fine-tuned on only 1% of the AU training set, strongly proving its robustness and generalization performance. The pre-trained model is available at our GitHub repository.
引用
收藏
页码:1669 / 1683
页数:15
相关论文
共 111 条
[71]   J(A)over-capA-Net: Joint Facial Action Unit Detection and Face Alignment Via Adaptive Attention [J].
Shao, Zhiwen ;
Liu, Zhilei ;
Cai, Jianfei ;
Ma, Lizhuang .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (02) :321-340
[72]  
Shi Jiawei, 2021, arXiv
[73]   INTRACLASS CORRELATIONS - USES IN ASSESSING RATER RELIABILITY [J].
SHROUT, PE ;
FLEISS, JL .
PSYCHOLOGICAL BULLETIN, 1979, 86 (02) :420-428
[74]   A Novel Machine Vision-Based 3D Facial Action Unit Identification for Fatigue Detection [J].
Sikander, Gulbadan ;
Anwar, Shahzad .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (05) :2730-2740
[75]   Hybrid Message Passing with Performance-Driven Structures for Facial Action Unit Detection [J].
Song, Tengfei ;
Cui, Zijun ;
Zheng, Wenming ;
Ji, Qiang .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6263-6272
[76]  
Song TF, 2021, AAAI CONF ARTIF INTE, V35, P5993, DOI 10.1609/aaai.v35i7.16748
[77]  
Steiner A, 2022, Arxiv, DOI arXiv:2106.10270
[78]   Facial action unit recognition by exploiting their dynamic and semantic relationships [J].
Tong, Yan ;
Liao, Wenhui ;
Ji, Qiang .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (10) :1683-1699
[79]  
Touvron H, 2021, PR MACH LEARN RES, V139, P7358
[80]  
Valstar M., 2006, Computer Vision and Pattern Recognition Workshop, P149