Multi-Scale Attention Learning Network for Facial Expression Recognition

被引：6

作者：

Dong, Qian ^{[1
]}

Ren, Weihong ^{[1
]}

Gao, Yu ^{[1
]}

Jiang, Weibo ^{[1
]}

Liu, Honghai ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Mech Engn & Automat, State Key Lab Robot & Syst, Shenzhen 518055, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2023年 / 30卷

关键词：

Facial expression recognition; multi-scale attention; vision transformer;

D O I：

10.1109/LSP.2023.3336257

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Facial Expression Recognition (FER) aims to identify emotional expressions in human faces, and it is a fundamental task in computer vision. Recently, some methods apply Vision Transformer (ViT) to FER and have achieved promising results. However, FER still suffers from two key issues: inter-class similarity and intra-class discrepancy. To address the issues, in this letter, we propose a Multi-Scale Attention Learning Network (MALN) based on ViT, which can learn facial expression embeddings in a multi-scale manner. Specifically, we adopt a multi-branch ViT architecture to adaptively explore multi-scale correlations without self-attention. Furthermore, we also design a Scale Distinction Loss (SDL) to dynamically regulate facial embeddings from multiple branches, which can guide ViT to capture discriminative facial regions. Experimental results on three public datasets (inluding RAF-DB, AffectNet and FERPlus) demonstrate the effectiveness of our proposed MALN for FER.

引用

页码：1732 / 1736

页数：5

共 39 条

[1] Emotion Recognition in Speech using Cross-Modal Transfer in the Wild [J].

Albanie, Samuel ;

Nagrani, Arsha ;

Vedaldi, Andrea ;

Zisserman, Andrew .

PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :292-301

[2]

[Anonymous], 2022, IEEE COMPUT SOC CONF, DOI DOI 10.1109/CVPRW56347.2022.00309

[3]

Aouayeb M, 2021, Arxiv, DOI arXiv:2107.03107

[4] Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution [J].

Barsoum, Emad ;

Zhang, Cha ;

Ferrer, Cristian Canton ;

Zhang, Zhengyou .

ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :279-283

[5] Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition [J].

Chen, Shikai ;

Wang, Jianfeng ;

Chen, Yuedong ;

Shi, Zhongchao ;

Geng, Xin ;

Rui, Yong .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :13981-13990

[6] Control of goal-directed and stimulus-driven attention in the brain [J].

Corbetta, M ;

Shulman, GL .

NATURE REVIEWS NEUROSCIENCE, 2002, 3 (03) :201-215

[7] ArcFace: Additive Angular Margin Loss for Deep Face Recognition [J].

Deng, Jiankang ;

Guo, Jia ;

Xue, Niannan ;

Zafeiriou, Stefanos .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4685-4694

[8]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[9] Mining Hard Augmented Samples for Robust Facial Landmark Localization With CNNs [J].

Feng, Zhen-Hua ;

Kittler, Josef ;

Wu, Xiao-Jun .

IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (03) :450-454

[10] Dual Attention Network for Scene Segmentation [J].

Fu, Jun ;

Liu, Jing ;

Tian, Haijie ;

Li, Yong ;

Bao, Yongjun ;

Fang, Zhiwei ;

Lu, Hanqing .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149

← 1 2 3 4 →