FUSING STRUCTURE AND APPEARANCE FEATURES IN FACIAL EXPRESSION RECOGNITION TRANSFORMER

被引:0
作者
Meng, Siwei [1 ]
Shi, Wuzhen [1 ]
机构
[1] Shenzhen Univ, Coll Elect & Informat Engn, Guangdong Prov Engn Lab Digital Creat Technol, Shenzhen, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Facial expression recognition; cross-fusion transformer; facial landmarks; structure feature; appearance feature;
D O I
10.1109/ICASSP48485.2024.10447031
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Facial expression recognition (FER) methods are fundamental in various human-computer interaction scenarios. Although deep learning-based models have made substantial progress in the FER field, they primarily focus on capturing facial appearance features while neglecting the importance of structure features, which encompass the overall shape and structure details of the key facial regions. We propose a Structure and Appearance Feature Cross-fusion Transformer (SAFCT) network to leverage structure and appearance features. Specifically, we introduce the gradient-based structure feature to simultaneously capture the overall face shape and local organ variations. For appearance features, we extract both global and landmarks-guided local features to capture global texture and local details. Furthermore, we employ the structure-dominated cross-fusion transformer to integrate these three facial features. Through extensive experimental results, we evaluate the state-of-the-art recognition performance of SAFCT on widely used FER datasets.
引用
收藏
页码:3600 / 3604
页数:5
相关论文
共 22 条
[1]   Application of non-negative and local non negative matrix factorization to facial expression recognition [J].
Buciu, I ;
Pitas, I .
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, 2004, :288-291
[2]   MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices [J].
Chen, Sheng ;
Liu, Yang ;
Gao, Xiang ;
Han, Zhen .
BIOMETRIC RECOGNITION, CCBR 2018, 2018, 10996 :428-438
[3]   CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition [J].
Chen, Yingjie ;
Chen, Diqi ;
Wang, Yizhou ;
Wang, Tao ;
Liang, Yun .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :1029-1037
[4]  
Dosovitskiy A., 2020, INT C LEARN REPR, P1
[5]  
Guo X., 2019, arXiv
[6]  
He K, 2016, PROC CVPR IEEE, P770, DOI [10.1109/CVPR.2016.90, DOI 10.1109/CVPR.2016.90]
[7]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[8]  
Li J., 2021, REMOTE SENS-BASEL, V13, P1
[9]   Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild [J].
Li, Shan ;
Deng, Weihong ;
Du, JunPing .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2584-2593
[10]   Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism [J].
Li, Yong ;
Zeng, Jiabei ;
Shan, Shiguang ;
Chen, Xilin .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (05) :2439-2450