Facial Action Unit Detection and Intensity Estimation From Self-Supervised Representation

被引：0

作者：

Ma, Bowen ^{[1
]}

An, Rudong ^{[1
]}

Zhang, Wei ^{[1
]}

Ding, Yu ^{[1
]}

Zhao, Zeng ^{[1
]}

Zhang, Rongsheng ^{[1
]}

Lv, Tangjie ^{[1
]}

Fan, Changjie ^{[1
]}

Hu, Zhipeng ^{[1
]}

机构：

[1] Netease Fuxi AI Lab, Hangzhou 310052, Peoples R China

来源：

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING | 2024年 / 15卷 / 03期

关键词：

Gold; Estimation; Image reconstruction; Annotations; Face recognition; Task analysis; Visualization; Facial action unit; facial expression recognition; facial representation model; self-supervised pre-training; RECOGNITION;

D O I：

10.1109/TAFFC.2024.3367015

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As a fine-grained and local expression behavior measurement, facial action unit (FAU) analysis (e.g., detection and intensity estimation) has been documented for its time-consuming, labor-intensive, and error-prone annotation. Thus a long-standing challenge of FAU analysis arises from the data scarcity of manual annotations, limiting the generalization ability of trained models to a large extent. Amounts of previous works have made efforts to alleviate this issue via semi/weakly supervised methods and extra auxiliary information. However, these methods still require domain knowledge and have not yet avoided the high dependency on data annotation. This article introduces a robust facial representation model MAE-Face for AU analysis. Using masked autoencoding as the self-supervised pre-training approach, MAE-Face first learns a high-capacity model from a feasible collection of face images without additional data annotations. Then after being fine-tuned on AU datasets, MAE-Face exhibits convincing performance for both AU detection and AU intensity estimation, achieving a new state-of-the-art on nearly all the evaluation results. Further investigation shows that MAE-Face achieves decent performance even when fine-tuned on only 1% of the AU training set, strongly proving its robustness and generalization performance. The pre-trained model is available at our GitHub repository.

引用

页码：1669 / 1683

页数：15

共 111 条

[1] Bachman P, 2019, ADV NEUR IN, V32
[2] Bao H., 2021, arXiv
[3] Bevilacqua F, 2016, INT CONF GAMES VIRTU
[4] Bihan Jiang, 2011, Proceedings 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG 2011), P314, DOI 10.1109/FG.2011.5771416
[5] Brown TB, 2020, ADV NEUR IN, V33
[6] Bulat A, 2022, Arxiv, DOI arXiv:2103.16554
[7] Emerging Properties in Self-Supervised Vision Transformers
Caron, Mathilde
Touvron, Hugo
Misra, Ishan
Jegou, Herve
Mairal, Julien
Bojanowski, Piotr
Joulin, Armand
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
[8] Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition
Chang, Yanan
Wang, Shangfei
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20385 - 20394
[9] Chen T, 2020, PR MACH LEARN RES, V119
[10] CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition
Chen, Yingjie
Chen, Diqi
Wang, Yizhou
Wang, Tao
Liang, Yun
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1029 - 1037

← 1 2 3 4 5 6 7 8 9 10 →