Towards Efficient Deep Learning Models for Facial Expression Recognition using Transformers

被引:3
作者
Safavi, Farshad [1 ]
Patel, Kulin [1 ]
Vinjamuri, Ramana Kumar [1 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Comp Sci & Elect Engn, Baltimore, MD 21250 USA
来源
2023 IEEE 19TH INTERNATIONAL CONFERENCE ON BODY SENSOR NETWORKS, BSN | 2023年
关键词
Facial Expression Recognition; Deep learning; Classification; Emotion detection; Transformer; SCALE;
D O I
10.1109/BSN58485.2023.10331041
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Facial expression recognition (FER) is crucial in various healthcare applications, including pain assessment, mental disorder diagnosis, and assistive robots that require close interaction with humans. While heavyweight deep learning models can achieve high accuracy for FER, their computational cost and memory consumption often need optimization for portable and mobile devices. Therefore, efficient deep learning models with high accuracy are essential to enable FER on resource-constrained platforms. This paper presents a new efficient deep-learning model for facial expression recognition. The model utilizes Mix Transformer (MiT) blocks, adopted from the SegFormer architecture, along with a supplemented fusion block. The efficient self-attention mechanism in the transformer focuses on relevant information for classifying different facial expressions while significantly improving efficiency. Furthermore, our supplemented fusion block integrates multiscale feature maps to capture both fine-grained and coarse features. Experimental results demonstrate that the proposed model significantly reduces the computational cost, latency, and the number of learnable parameters while achieving high accuracy compared with the previous state-of-the-art (SOTA) on the FER2013 dataset.
引用
收藏
页数:4
相关论文
共 12 条
[1]  
Goodfellow Ian J., 2013, Neural Information Processing. 20th International Conference, ICONIP 2013. Proceedings: LNCS 8228, P117, DOI 10.1007/978-3-642-42051-1_16
[2]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[3]   Densely Connected Convolutional Networks [J].
Huang, Gao ;
Liu, Zhuang ;
van der Maaten, Laurens ;
Weinberger, Kilian Q. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269
[4]   Distinctive image features from scale-invariant keypoints [J].
Lowe, DG .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 60 (02) :91-110
[5]   Facial Expression Recognition Using Residual Masking Network [J].
Luan Pham ;
The Huynh Vu ;
Tuan Anh Tran .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :4513-4519
[6]   Multiresolution gray-scale and rotation invariant texture classification with local binary patterns [J].
Ojala, T ;
Pietikäinen, M ;
Mäenpää, T .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (07) :971-987
[7]  
Pecoraro R., 2021, CoRR, V2111
[8]   Comparative Study of Real-Time Semantic Segmentation Networks in Aerial Images During Flooding Events [J].
Safavi, Farshad ;
Rahnemoonfar, Maryam .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 :15-31
[9]   Comparative Study Between Real-Time and Non-Real-Time Segmentation Models on Flooding Events [J].
Safavi, Farshad ;
Chowdhury, Tashnim ;
Rahnemoonfar, Maryam .
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, :4199-4207
[10]   Residual Attention Network for Image Classification [J].
Wang, Fei ;
Jiang, Mengqing ;
Qian, Chen ;
Yang, Shuo ;
Li, Cheng ;
Zhang, Honggang ;
Wang, Xiaogang ;
Tang, Xiaoou .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6450-6458