Depth-Aware Dual-Stream Interactive Transformer Network for Facial Expression Recognition

被引：0

作者：

Jiang, Yiben ^{[1
]}

Yang, Xiao ^{[1
,2
]}

Fu, Keren ^{[1
,2
]}

Yang, Hongyu ^{[1
,2
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China

[2] Sichuan Univ, Natl Key Lab Fundamental Sci Synthet Vis, Chengdu, Sichuan, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI | 2025年 / 15041卷

基金：

中国国家自然科学基金;

关键词：

Facial expression recognition; Transformer; Depth-aware feature extraction; Attention;

D O I：

10.1007/978-981-97-8795-1_38

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Facial Expression Recognition (FER) is a challenging task in computer vision, especially in the wild where factors like diverse head poses and occlusions can significantly impact recognition performance. Recent developments in RGB-D Facial Recognition (FR) methods have highlighted the superior sensitivity of depth information to occlusion and pose variations, facilitating the capture of finer facial 3D details and consequent performance enhancement. Nevertheless, prevalent FER datasets and application scenarios typically lack depth information, offering only RGB images. Hence, this paper introduces an innovative RGB FER approach grounded in depth-aware feature perception and a dual-stream interactive transformer network. Real depth is not required during inference, so in conditions that only RGB data is available, our method can effectively leverage perceived depth information for recognition. Guided by real depth features from depth images on an RGB-D FR dataset, we design and pre-train an auxiliary encoder called Depth-Aware Encoder (DAEncoder) to perceive and extract depth-aware expression features from RGB faces. Then, we propose a Dual-stream Interactive Transformer (DIT) with cross-attention to interact RGB and depth-aware features. Additionally, the RGB stream integrates self-attention and cross-attention to facilitate information fusion for final facial expression recognition. The experimental findings showcase the promising performance of our method across various Facial Expression Recognition (FER) datasets, including RAF-DB, AffectNet 7, and AffectNet 8.

引用

页码：563 / 577

页数：15

共 38 条

[1] Alexey D, 2020, arXiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[2] AdaBins: Depth Estimation Using Adaptive Bins
Bhat, Shariq Farooq
Alhashim, Ibraheem
Wonka, Peter
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4008 - 4017
[3] Improving 2D Face Recognition via Discriminative Face Depth Estimation
Cui, Jiyun
Zhang, Hao
Han, Hu
Shan, Shiguang
Chen, Xilin
[J]. 2018 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB), 2018, : 140 - 147
[4] Histograms of oriented gradients for human detection
Dalal, N
Triggs, B
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
[5] ArcFace: Additive Angular Margin Loss for Deep Face Recognition
Deng, Jiankang
Guo, Jia
Xue, Niannan
Zafeiriou, Stefanos
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4685 - 4694
[6] Occlusion-Adaptive Deep Network for Robust Facial Expression Recognition
Ding, Hui
Zhou, Peng
Chellappa, Rama
[J]. IEEE/IAPR INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2020), 2020,
[7] Goswami G, 2013, 2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON BIOMETRICS: THEORY, APPLICATIONS AND SYSTEMS (BTAS)
[8] MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
Guo, Yandong
Zhang, Lei
Hu, Yuxiao
He, Xiaodong
Gao, Jianfeng
[J]. COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 87 - 102
[9] Facial Expression Recognition with Geometric Scattering on 3D Point Clouds
He, Yi
Fu, Keren
Cheng, Peng
Zhang, Jianwei
[J]. SENSORS, 2022, 22 (21)
[10] Hinton G, 2015, Arxiv, DOI arXiv:1503.02531

← 1 2 3 4 →