Depth-Aware Dual-Stream Interactive Transformer Network for Facial Expression Recognition

被引:0
作者
Jiang, Yiben [1 ]
Yang, Xiao [1 ,2 ]
Fu, Keren [1 ,2 ]
Yang, Hongyu [1 ,2 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China
[2] Sichuan Univ, Natl Key Lab Fundamental Sci Synthet Vis, Chengdu, Sichuan, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI | 2025年 / 15041卷
基金
中国国家自然科学基金;
关键词
Facial expression recognition; Transformer; Depth-aware feature extraction; Attention;
D O I
10.1007/978-981-97-8795-1_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Facial Expression Recognition (FER) is a challenging task in computer vision, especially in the wild where factors like diverse head poses and occlusions can significantly impact recognition performance. Recent developments in RGB-D Facial Recognition (FR) methods have highlighted the superior sensitivity of depth information to occlusion and pose variations, facilitating the capture of finer facial 3D details and consequent performance enhancement. Nevertheless, prevalent FER datasets and application scenarios typically lack depth information, offering only RGB images. Hence, this paper introduces an innovative RGB FER approach grounded in depth-aware feature perception and a dual-stream interactive transformer network. Real depth is not required during inference, so in conditions that only RGB data is available, our method can effectively leverage perceived depth information for recognition. Guided by real depth features from depth images on an RGB-D FR dataset, we design and pre-train an auxiliary encoder called Depth-Aware Encoder (DAEncoder) to perceive and extract depth-aware expression features from RGB faces. Then, we propose a Dual-stream Interactive Transformer (DIT) with cross-attention to interact RGB and depth-aware features. Additionally, the RGB stream integrates self-attention and cross-attention to facilitate information fusion for final facial expression recognition. The experimental findings showcase the promising performance of our method across various Facial Expression Recognition (FER) datasets, including RAF-DB, AffectNet 7, and AffectNet 8.
引用
收藏
页码:563 / 577
页数:15
相关论文
共 38 条
  • [1] Alexey D, 2020, arXiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
  • [2] AdaBins: Depth Estimation Using Adaptive Bins
    Bhat, Shariq Farooq
    Alhashim, Ibraheem
    Wonka, Peter
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4008 - 4017
  • [3] Improving 2D Face Recognition via Discriminative Face Depth Estimation
    Cui, Jiyun
    Zhang, Hao
    Han, Hu
    Shan, Shiguang
    Chen, Xilin
    [J]. 2018 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB), 2018, : 140 - 147
  • [4] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
  • [5] ArcFace: Additive Angular Margin Loss for Deep Face Recognition
    Deng, Jiankang
    Guo, Jia
    Xue, Niannan
    Zafeiriou, Stefanos
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4685 - 4694
  • [6] Occlusion-Adaptive Deep Network for Robust Facial Expression Recognition
    Ding, Hui
    Zhou, Peng
    Chellappa, Rama
    [J]. IEEE/IAPR INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2020), 2020,
  • [7] Goswami G, 2013, 2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON BIOMETRICS: THEORY, APPLICATIONS AND SYSTEMS (BTAS)
  • [8] MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
    Guo, Yandong
    Zhang, Lei
    Hu, Yuxiao
    He, Xiaodong
    Gao, Jianfeng
    [J]. COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 87 - 102
  • [9] Facial Expression Recognition with Geometric Scattering on 3D Point Clouds
    He, Yi
    Fu, Keren
    Cheng, Peng
    Zhang, Jianwei
    [J]. SENSORS, 2022, 22 (21)
  • [10] Hinton G, 2015, Arxiv, DOI arXiv:1503.02531