Attention-Rectified and Texture-Enhanced Cross-Attention Transformer Feature Fusion Network for Facial Expression Recognition

被引:19
作者
Sun, Mingyi [1 ]
Cui, Weigang [2 ]
Zhang, Yue [1 ]
Yu, Shuyue [3 ]
Liao, Xiaofeng [4 ]
Hu, Bin [5 ]
Li, Yang [1 ,6 ]
机构
[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Engn Med, Beijing 100191, Peoples R China
[3] Beijing Aerosp Measurement & Control Technol Co L, Beijing 100024, Peoples R China
[4] Chongqing Univ, Coll Comp Sci, Chongqing 400044, Peoples R China
[5] Lanzhou Univ, Sch Informat Sci & Engn, Gansu Prov Key Lab Wearable Comp, Lanzhou 730000, Peoples R China
[6] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
基金
北京市自然科学基金;
关键词
Convolutional neural network (CNN); face detector; facial expression recognition (FER); texture features; transformer;
D O I
10.1109/TII.2023.3253188
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Facial expression recognition (FER) in the wild is a challenging task for affective computing in human-machine interaction fields. However, most of the existing methods fail to learn the most prominent regions of facial images by simple cross-entropy loss due to the imbalance problem commonly existing in FER datasets, which limits the robustness and interpretability of the model. In addition, these methods only capture local features of original images with multisize shallow convolution and ignore facial texture characteristics, leading to a suboptimal recognition performance. To address these issues, in this article, we propose a novel FER network, named the attention-rectified and texture-enhanced cross-attention transformer feature fusion network (AR-TE-CATFFNet). Specifically, an attention-rectified convolution block is first designed to assist multiple convolution heads to focus on the critical areas of human faces and improve the model generalization. Second, we investigate a texture enhancement block to capture texture features through local binary pattern and gray-level co-occurrence matrix, which solves the limitation of insufficient texture information. Finally, a cross-attention transformer feature fusion block is employed to deeply integrate red, green, blue (RGB) features and texture features globally, which is beneficial to boost the accuracy of recognition. Competitive experimental results on three public datasets validate the efficacy of the proposed method, indicating that our proposed method achieves superior classification performance of 89.50% on real-world affective faces database (RAF-DB) dataset, 65.66% on AffectNet dataset, and 74.84% on FER2013 dataset against the existing methods.
引用
收藏
页码:11823 / 11832
页数:10
相关论文
共 51 条
[1]   EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild [J].
Benitez-Quiroz, C. Fabian ;
Srinivasan, Ramprakash ;
Martinez, Aleix M. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5562-5570
[2]   Impact of Deep Learning Approaches on Facial Expression Recognition in Healthcare Industries [J].
Bisogni, Carmen ;
Castiglione, Aniello ;
Hossain, Sanoar ;
Narducci, Fabio ;
Umer, Saiyed .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (08) :5619-5627
[3]   CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].
Chen, Chun-Fu ;
Fan, Quanfu ;
Panda, Rameswar .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356
[4]   Toward Children's Empathy Ability Analysis: Joint Facial Expression Recognition and Intensity Estimation Using Label Distribution Learning [J].
Chen, Jingying ;
Guo, Chen ;
Xu, Ruyi ;
Zhang, Kun ;
Yang, Zongkai ;
Liu, Honghai .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (01) :16-25
[5]   Occlusion-Adaptive Deep Network for Robust Facial Expression Recognition [J].
Ding, Hui ;
Zhou, Peng ;
Chellappa, Rama .
IEEE/IAPR INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2020), 2020,
[6]  
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[7]   Facial Expression Recognition With Deeply-Supervised Attention Network [J].
Fan, Yingruo ;
Li, Victor O. K. ;
Lam, Jacqueline C. K. .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (02) :1057-1071
[8]   Ad-Corre: Adaptive Correlation-Based Loss for Facial Expression Recognition in the Wild [J].
Fard, Ali Pourramezan ;
Mahoor, Mohammad H. .
IEEE ACCESS, 2022, 10 :26756-26768
[9]   Discriminant Distribution-Agnostic Loss for Facial Expression Recognition in the Wild [J].
Farzaneh, Amir Hossein ;
Qi, Xiaojun .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :1631-1639
[10]  
Goodfellow Ian J., 2013, Neural Information Processing. 20th International Conference, ICONIP 2013. Proceedings: LNCS 8228, P117, DOI 10.1007/978-3-642-42051-1_16