Dynamic facial expression recognition based on spatial key-points optimized region feature fusion and temporal self-attention

被引：1

作者：

Huang, Zhiwei ^{[1
]}

Zhu, Yu ^{[1
]}

Li, Hangyu ^{[1
]}

Yang, Dawei ^{[2
,3
]}

机构：

[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China

[2] Fudan Univ, Zhongshan Hosp, Dept Pulm & Crit Care Med, Shanghai 200032, Peoples R China

[3] Shanghai Engn Res Ctr Internet Things Resp Med, Shanghai 200032, Peoples R China

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2024年 / 133卷

基金：

中国国家自然科学基金;

关键词：

Dynamic facial expression recognition; Spatial feature fusion; Graph convolution network; Self-attention;

D O I：

10.1016/j.engappai.2024.108535

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dynamic facial expression recognition (DFER) is of great significance in promoting empathetic machines and metaverse technology. However, dynamic facial expression recognition (DFER) in the wild remains a challenging task, often constrained by complex lighting changes, frequent key-points occlusion, uncertain emotional peaks and severe imbalanced dataset categories. To tackle these problems, this paper presents a depth neural network model based on spatial key-points optimized region feature fusion and temporal self- attention. The method includes three parts: spatial feature extraction module, temporal feature extraction module and region feature fusion module. The intra-frame spatial feature extraction module is composed of the key-points graph convolution network (GCN) and a convolution network (CNN) branch to obtain the global and local feature vectors. The newly proposed region fusion strategy based on face spatial structure is used to obtain the spatial fusion feature of each frame. The inter-frame temporal feature extraction module uses multi-head self-attention model to obtain the temporal information of inter-frames. The experimental results show that our method achieves accuracy of 68.73%, 55.00%, 47.80%, and 47.44% on the DFEW, AFEW, FERV39k, and MAFW datasets. Ablation experiments showed that the GCN module, fusion module, and temporal module improved the accuracy on DFEW by 0.68%, 1.66%, and 3.25%, respectively. The method also achieves competitive results in terms of parameter quantity and inference speed, which demonstrates the effectiveness of the proposed method.

引用

页数：12

共 31 条

[1] Perceiving informative key-points: A self-attention approach for person search
Gao, Guangyu
Han, Cen
Liu, Zhen
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 101
[2] STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition
Chen, Weicong
Zhang, Dong
Li, Ming
Lee, Dah-Jye
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) : 800 - 810
[3] A framework for facial expression recognition using deep self-attention network
Indolia S.
Nigam S.
Singh R.
Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (07) : 9543 - 9562
[4] Singing Melody Extraction Based on Combined Frequency-Temporal Attention and Attentional Feature Fusion with Self-Attention
Qi, Xi
Tian, Lihua
Li, Chen
Song, Hui
Yan, Jiahui
2022 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2022, : 220 - 227
[5] Local Multi-Head Channel Self-Attention for Facial Expression Recognition
Pecoraro, Roberto
Basile, Valerio
Bono, Viviana
INFORMATION, 2022, 13 (09)
[6] A self-attention-based fusion framework for facial expression recognition in wavelet domain
Indolia, Sakshi
Nigam, Swati
Singh, Rajiv
VISUAL COMPUTER, 2024, 40 (09) : 6341 - 6357
[7] Dynamic Spatial-Temporal Self-Attention Network for Traffic Flow Prediction
Wang, Dong
Yang, Hongji
Zhou, Hua
FUTURE INTERNET, 2024, 16 (06)
[8] Self-Attention Networks For Motion Posture Recognition Based On Data Fusion
Ji, Zhihao
Xie, Qiang
4TH INTERNATIONAL CONFERENCE ON INFORMATICS ENGINEERING AND INFORMATION SCIENCE (ICIEIS2021), 2022, 12161
[9] Parallel Self-Attention and Spatial-Attention Fusion for Human Pose Estimation and Running Movement Recognition
Wu, Qingtian
Zhang, Yu
Zhang, Liming
Yu, Haoyong
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (01) : 358 - 368
[10] SlowR50-SA: A Self-Attention Enhanced Dynamic Facial Expression Recognition Model for Tactile Internet Applications
Neshov, Nikolay
Christoff, Nicole
Sechkova, Teodora
Tonchev, Krasimir
Manolova, Agata
ELECTRONICS, 2024, 13 (09)

← 1 2 3 4 →