VaBTFER: An Effective Variant Binary Transformer for Facial Expression Recognition

被引：2

作者：

Shen, Lei ^{[1
]}

Jin, Xing ^{[1
]}

机构：

[1] Nanjing Forestry Univ, Coll Informat Sci & Technol, Nanjing 100190, Peoples R China

来源：

SENSORS | 2024年 / 24卷 / 01期

关键词：

facial expression recognition; spatial-channel feature relevance Transformer; lightweight variant Transformer; binary quantization mechanism; multilayer channel reduction self-attention; dynamic learnable information extraction; NEURAL-NETWORK;

D O I：

10.3390/s24010147

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Existing Transformer-based models have achieved impressive success in facial expression recognition (FER) by modeling the long-range relationships among facial muscle movements. However, the size of pure Transformer-based models tends to be in the million-parameter level, which poses a challenge for deploying these models. Moreover, the lack of inductive bias in Transformer usually leads to the difficulty of training from scratch on limited FER datasets. To address these problems, we propose an effective and lightweight variant Transformer for FER called VaTFER. In VaTFER, we firstly construct action unit (AU) tokens by utilizing action unit-based regions and their histogram of oriented gradient (HOG) features. Then, we present a novel spatial-channel feature relevance Transformer (SCFRT) module, which incorporates multilayer channel reduction self-attention (MLCRSA) and a dynamic learnable information extraction (DLIE) mechanism. MLCRSA is utilized to model long-range dependencies among all tokens and decrease the number of parameters. DLIE's goal is to alleviate the lack of inductive bias and improve the learning ability of the model. Furthermore, we use an excitation module to replace the vanilla multilayer perception (MLP) for accurate prediction. To further reduce computing and memory resources, we introduce a binary quantization mechanism, formulating a novel lightweight Transformer model called variant binary Transformer for FER (VaBTFER). We conduct extensive experiments on several commonly used facial expression datasets, and the results attest to the effectiveness of our methods.

引用

页数：20

共 76 条

[1]

Bengio Y, 2013, Arxiv, DOI [arXiv:1305.2982, DOI 10.48550/ARXIV.1305.2982]

[2] Multi-Relations Aware Network for In-the-Wild Facial Expression Recognition [J].

Chen, Dongliang ;

Wen, Guihua ;

Li, Huihui ;

Chen, Rui ;

Li, Cheng .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) :3848-3859

[3]

Cheng XY, 2020, Arxiv, DOI arXiv:2008.01818

[4]

Choi M, 2020, AAAI CONF ARTIF INTE, V34, P10663

[5] Low-bit Quantization of Neural Networks for Efficient Inference [J].

Choukroun, Yoni ;

Kravchik, Eli ;

Yang, Fan ;

Kisilev, Pavel .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :3009-3018

[6] Domain-Incremental Continual Learning for Mitigating Bias in Facial Expression and Action Unit Recognition [J].

Churamani, Nikhil ;

Kara, Ozgur ;

Gunes, Hatice .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (04) :3191-3206

[7]

Courbariaux M, 2016, Arxiv, DOI arXiv:1602.02830

[8]

Cugu I., 2019, P 2019 9 INT C IMAGE, P1

[9]

Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]

[10]

Dosovitskiy Alexey, 2021, ICLR

← 1 2 3 4 5 6 7 8 →