Weakly Supervised Regional and Temporal Learning for Facial Action Unit Recognition

被引：3

作者：

Yan, Jingwei ^{[1
]}

Wang, Jingjing ^{[1
]}

Li, Qiang ^{[1
]}

Wang, Chunmao ^{[1
]}

Pu, Shiliang ^{[1
]}

机构：

[1] Hikvis Res Inst, Hangzhou 310051, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

关键词：

Gold; Task analysis; Face recognition; Feature extraction; Representation learning; Optical imaging; Facial muscles; Facial action unit recognition; regional and temporal feature learning; weakly supervised learning;

D O I：

10.1109/TMM.2022.3160061

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic facial action unit (AU) recognition is a challenging task due to the scarcity of manual annotations. To alleviate this problem, a large amount of efforts has been dedicated to exploiting various weakly supervised methods which leverage numerous unlabeled data. However, many aspects with regard to some unique properties of AUs, such as the regional and relational characteristics, are not sufficiently explored in previous works. Motivated by this, we take the AU properties into consideration and propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance in a self-supervised manner via the unlabeled data. Specifically, to enhance the discrimination of regional features with AU relation embedding, we design a task of RoI inpainting to recover the randomly cropped AU patches. Meanwhile, a single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles and encode the motion information into the global feature representation. Based on these two self-supervised auxiliary tasks, local features, mutual relation and motion cues of AUs are better captured in the backbone network. Furthermore, by incorporating semi-supervised learning, we propose an end-to-end trainable framework named weakly supervised regional and temporal learning (WSRTL) for AU recognition. Extensive experiments on BP4D and DISFA demonstrate the superiority of our method and new state-of-the-art performances are achieved.

引用

页码：1760 / 1772

页数：13

共 51 条

[1] Bengio Y, 2005, NEURIPS
[2] EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild
Benitez-Quiroz, C. Fabian
Srinivasan, Ramprakash
Martinez, Aleix M.
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5562 - 5570
[3] Berthelot D, 2019, ADV NEUR IN, V32
[4] Carion N, 2020, European conference on computer vision, P213, DOI DOI 10.1007/978-3-030-58452-813
[5] FATAUVA-Net : An Integrated Deep Learning Framework for Facial Attribute Recognition, Action Unit Detection, and Valence-Arousal Estimation
Chang, Wei-Yi
Hsu, Shih-Huan
Chien, Jen-Hsien
[J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 1963 - 1971
[6] Chu WS, 2016, Arxiv, DOI arXiv:1608.00911
[7] Deep Structure Inference Network for Facial Action Unit Recognition
Corneanu, Ciprian
Madadi, Meysam
Escalera, Sergio
[J]. COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 309 - 324
[8] Dosovitskiy A, 2014, ADV NEUR IN, V27
[9] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[10] Ekman P., 1997, WHAT FACE REVEALS BA

← 1 2 3 4 5 6 →