Robust facial expression recognition with Transformer Block Enhancement Module

被引:6
作者
Xie, Yuanlun [1 ,2 ]
Tian, Wenhong [1 ,2 ]
Yu, Zitong [3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Software Engn, Chengdu 610054, Peoples R China
[2] Chengdu ZhongKeYunJi Informat Technol Co Ltd, Chengdu 610054, Peoples R China
[3] Great Bay Univ, Dongguan 523000, Peoples R China
关键词
Facial expression recognition; Neural network; Deep learning; Transformer Block Enhancement Module; ATTENTION; MODELS;
D O I
10.1016/j.engappai.2023.106795
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, facial expression recognition (FER) methods have achieved significant progress. However, FER is still challenged by factors such as uneven illumination and low-quality expression images. Exploring the potential of facial expression features to achieve robust FER is particularly important. Inspired by the Transformer's excellent performance in modeling long-range dependencies and in vision tasks, this paper proposes a Transformer Block Enhancement Module (TBEM) for enhancing the feature representation of facial expressions. The proposed module contains a Channel Enhancement (CE) block and a Spatial Enhancement (SE) block. The CE block can adaptively enhance the expression features on the channel dimension by effectively leveraging the channel dependency information, while the SE block enhances the expression features on the spatial dimension by integrating the spatial dependency information. TBEM can output a more robust expression representation by combining CE and SE. The proposed artificial intelligence learning module greatly improves the recognition accuracy of FER engineering tasks. To further illustrate the application of TBEM in real-world FER engineering, three engineering problems are used for verification. Extensive experiments demonstrate that the proposed method improves FER performance by focusing on more accurate decisional features and can be easily embedded into regular convolutional neural network models to help them improve the accuracy on FER tasks by about 2.64%-3.03%. The results show the proposed method achieves accuracies of 90.57% on FERPlus, 89.41% on RAFDB basic and 68.43% on RAFDB compound, respectively. The proposed method also provides a meaningful reference for further research on applying Transformer to other tasks.
引用
收藏
页数:13
相关论文
共 63 条
[1]   A comparative study on optical flow for facial expression analysis [J].
Allaert, B. ;
Ward, I. R. ;
Bilasco, I. M. ;
Djeraba, C. ;
Bennamoun, M. .
NEUROCOMPUTING, 2022, 500 :434-448
[2]  
Barros Pablo, 2020, SN Comput Sci, V1, P321, DOI [10.1007/s42979-020-00325-6, 10.1007/s42979-020-00325-6]
[3]   Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution [J].
Barsoum, Emad ;
Zhang, Cha ;
Ferrer, Cristian Canton ;
Zhang, Zhengyou .
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :279-283
[4]  
Beal Josh, 2020, arXiv, DOI DOI 10.48550/ARXIV.2012.09958
[5]   Application of non-negative and local non negative matrix factorization to facial expression recognition [J].
Buciu, I ;
Pitas, I .
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, 2004, :288-291
[6]   Island Loss for Learning Discriminative Features in Facial Expression Recognition [J].
Cai, Jie ;
Meng, Zibo ;
Khan, Ahmed Shehab ;
Li, Zhiyuan ;
O'Reilly, James ;
Tong, Yan .
PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, :302-309
[7]   A survey on facial emotion recognition techniques: A state-of-the-art literature review [J].
Canal, Felipe Zago ;
Mueller, Tobias Rossi ;
Matias, Jhennifer Cristine ;
Scotton, Gustavo Gino ;
de Sa, Antonio Reis ;
Pozzebon, Eliane ;
Sobieranski, Antonio Carlos .
INFORMATION SCIENCES, 2022, 582 :593-617
[8]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[9]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[10]  
Dosovitskiy Alexey., 2020, INT C LEARNING REPRE