Weakly Supervised Regional and Temporal Learning for Facial Action Unit Recognition

被引:3
作者
Yan, Jingwei [1 ]
Wang, Jingjing [1 ]
Li, Qiang [1 ]
Wang, Chunmao [1 ]
Pu, Shiliang [1 ]
机构
[1] Hikvis Res Inst, Hangzhou 310051, Peoples R China
关键词
Gold; Task analysis; Face recognition; Feature extraction; Representation learning; Optical imaging; Facial muscles; Facial action unit recognition; regional and temporal feature learning; weakly supervised learning;
D O I
10.1109/TMM.2022.3160061
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic facial action unit (AU) recognition is a challenging task due to the scarcity of manual annotations. To alleviate this problem, a large amount of efforts has been dedicated to exploiting various weakly supervised methods which leverage numerous unlabeled data. However, many aspects with regard to some unique properties of AUs, such as the regional and relational characteristics, are not sufficiently explored in previous works. Motivated by this, we take the AU properties into consideration and propose two auxiliary AU related tasks to bridge the gap between limited annotations and the model performance in a self-supervised manner via the unlabeled data. Specifically, to enhance the discrimination of regional features with AU relation embedding, we design a task of RoI inpainting to recover the randomly cropped AU patches. Meanwhile, a single image based optical flow estimation task is proposed to leverage the dynamic change of facial muscles and encode the motion information into the global feature representation. Based on these two self-supervised auxiliary tasks, local features, mutual relation and motion cues of AUs are better captured in the backbone network. Furthermore, by incorporating semi-supervised learning, we propose an end-to-end trainable framework named weakly supervised regional and temporal learning (WSRTL) for AU recognition. Extensive experiments on BP4D and DISFA demonstrate the superiority of our method and new state-of-the-art performances are achieved.
引用
收藏
页码:1760 / 1772
页数:13
相关论文
共 51 条
  • [1] Bengio Y, 2005, NEURIPS
  • [2] EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild
    Benitez-Quiroz, C. Fabian
    Srinivasan, Ramprakash
    Martinez, Aleix M.
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5562 - 5570
  • [3] Berthelot D, 2019, ADV NEUR IN, V32
  • [4] Carion N, 2020, European conference on computer vision, P213, DOI DOI 10.1007/978-3-030-58452-813
  • [5] FATAUVA-Net : An Integrated Deep Learning Framework for Facial Attribute Recognition, Action Unit Detection, and Valence-Arousal Estimation
    Chang, Wei-Yi
    Hsu, Shih-Huan
    Chien, Jen-Hsien
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 1963 - 1971
  • [6] Chu WS, 2016, Arxiv, DOI arXiv:1608.00911
  • [7] Deep Structure Inference Network for Facial Action Unit Recognition
    Corneanu, Ciprian
    Madadi, Meysam
    Escalera, Sergio
    [J]. COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 309 - 324
  • [8] Dosovitskiy A, 2014, ADV NEUR IN, V27
  • [9] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [10] Ekman P., 1997, WHAT FACE REVEALS BA