Multi-Scale Attention Learning Network for Facial Expression Recognition

被引:6
作者
Dong, Qian [1 ]
Ren, Weihong [1 ]
Gao, Yu [1 ]
Jiang, Weibo [1 ]
Liu, Honghai [1 ]
机构
[1] Harbin Inst Technol, Sch Mech Engn & Automat, State Key Lab Robot & Syst, Shenzhen 518055, Peoples R China
关键词
Facial expression recognition; multi-scale attention; vision transformer;
D O I
10.1109/LSP.2023.3336257
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Facial Expression Recognition (FER) aims to identify emotional expressions in human faces, and it is a fundamental task in computer vision. Recently, some methods apply Vision Transformer (ViT) to FER and have achieved promising results. However, FER still suffers from two key issues: inter-class similarity and intra-class discrepancy. To address the issues, in this letter, we propose a Multi-Scale Attention Learning Network (MALN) based on ViT, which can learn facial expression embeddings in a multi-scale manner. Specifically, we adopt a multi-branch ViT architecture to adaptively explore multi-scale correlations without self-attention. Furthermore, we also design a Scale Distinction Loss (SDL) to dynamically regulate facial embeddings from multiple branches, which can guide ViT to capture discriminative facial regions. Experimental results on three public datasets (inluding RAF-DB, AffectNet and FERPlus) demonstrate the effectiveness of our proposed MALN for FER.
引用
收藏
页码:1732 / 1736
页数:5
相关论文
共 39 条
[1]   Emotion Recognition in Speech using Cross-Modal Transfer in the Wild [J].
Albanie, Samuel ;
Nagrani, Arsha ;
Vedaldi, Andrea ;
Zisserman, Andrew .
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :292-301
[2]  
[Anonymous], 2022, IEEE COMPUT SOC CONF, DOI DOI 10.1109/CVPRW56347.2022.00309
[3]  
Aouayeb M, 2021, Arxiv, DOI arXiv:2107.03107
[4]   Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution [J].
Barsoum, Emad ;
Zhang, Cha ;
Ferrer, Cristian Canton ;
Zhang, Zhengyou .
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :279-283
[5]   Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition [J].
Chen, Shikai ;
Wang, Jianfeng ;
Chen, Yuedong ;
Shi, Zhongchao ;
Geng, Xin ;
Rui, Yong .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :13981-13990
[6]   Control of goal-directed and stimulus-driven attention in the brain [J].
Corbetta, M ;
Shulman, GL .
NATURE REVIEWS NEUROSCIENCE, 2002, 3 (03) :201-215
[7]   ArcFace: Additive Angular Margin Loss for Deep Face Recognition [J].
Deng, Jiankang ;
Guo, Jia ;
Xue, Niannan ;
Zafeiriou, Stefanos .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4685-4694
[8]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[9]   Mining Hard Augmented Samples for Robust Facial Landmark Localization With CNNs [J].
Feng, Zhen-Hua ;
Kittler, Josef ;
Wu, Xiao-Jun .
IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (03) :450-454
[10]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149