UA-FER: Uncertainty-aware representation learning for facial expression recognition

被引:1
作者
Zhou, Haoliang [1 ]
Huang, Shucheng [1 ]
Xu, Yuqiao [2 ]
机构
[1] Jiangsu Univ Sci & Technol, Sch Comp, Zhenjiang 212003, Peoples R China
[2] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China
基金
中国国家自然科学基金;
关键词
Facial expression recognition; Uncertainty-aware representation learning; Evidential deep learning; Vision-language pre-training model; Knowledge distillation; FEATURES;
D O I
10.1016/j.neucom.2024.129261
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Facial Expression Recognition (FER) remains a challenging task due to unconstrained conditions like variations in illumination, pose, and occlusion. Current FER approaches mainly focus on learning discriminative features through local attention and global perception of visual encoders, while neglecting the rich semantic information in the text modality. Additionally, these methods rely solely on the softmax-based activation layer for predictions, resulting in overconfident decision-making that hampers the effective handling of uncertain samples and relationships. Such insufficient representations and overconfident predictions degrade recognition performance, particularly in unconstrained scenarios. To tackle these issues, we propose an end-to-end FER framework called UA-FER, which integrates vision-language pre-training (VLP) models with evidential deep learning (EDL) theory to enhance recognition accuracy and robustness. Specifically, to identify multi-grained discriminative regions, we propose the Multi-granularity Feature Decoupling (MFD) module, which decouples global and local facial representations based on image-text affinity while distilling the universal knowledge from the pre-trained VLP models. Additionally, to mitigate misjudgments in uncertain visual-textual relationships, we introduce the Relation Uncertainty Calibration (RUC) module, which corrects these uncertainties using EDL theory. In this way, the model enhances its ability to capture emotion-related discriminative representations and tackle uncertain relationships, thereby improving overall recognition accuracy and robustness. Extensive experiments on in-the-wild and in-the-lab datasets demonstrate that our UA-FER outperforms the state-of-the-art models.
引用
收藏
页数:13
相关论文
共 77 条
  • [1] Amini A., 2020, ADV NEURAL INFORM PR, V33, P14927, DOI DOI 10.48550/ARXIV.1910.02600
  • [2] Evidential Deep Learning for Open Set Action Recognition
    Bao, Wentao
    Yu, Qi
    Kong, Yu
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13329 - 13338
  • [3] SchiNet: Automatic Estimation of Symptoms of Schizophrenia from Facial Behaviour Analysis
    Bishay, Mina
    Palasek, Petar
    Priebe, Stefan
    Patras, Ioannis
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (04) : 949 - 961
  • [4] Probabilistic Attribute Tree Structured Convolutional Neural Networks for Facial Expression Recognition in the Wild
    Cai, Jie
    Meng, Zibo
    Khan, Ahmed Shehab
    Li, Zhiyuan
    O'Reilly, James
    Tong, Yan
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1927 - 1941
  • [5] Cameron Alan, 2019, GPS World, DOI DOI 10.1109/WCSP.2019.8928049
  • [6] Multi-Relations Aware Network for In-the-Wild Facial Expression Recognition
    Chen, Dongliang
    Wen, Guihua
    Li, Huihui
    Chen, Rui
    Li, Cheng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 3848 - 3859
  • [7] Cascade Evidential Learning for Open-world Weakly-supervised Temporal Action Localization
    Chen, Mengyuan
    Gao, Junyu
    Xu, Changsheng
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14741 - 14750
  • [8] Dual-Evidential Learning for Weakly-supervised Temporal Action Localization
    Chen, Mengyuan
    Gao, Junyu
    Yang, Shicai
    Xu, Changsheng
    [J]. COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 192 - 208
  • [9] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
  • [10] Deng Danruo, P MACHINE LEARNING R