Dynamic facial expression recognition based on spatial key-points optimized region feature fusion and temporal self-attention

被引：1

作者：

Huang, Zhiwei ^{[1
]}

Zhu, Yu ^{[1
]}

Li, Hangyu ^{[1
]}

Yang, Dawei ^{[2
,3
]}

机构：

[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China

[2] Fudan Univ, Zhongshan Hosp, Dept Pulm & Crit Care Med, Shanghai 200032, Peoples R China

[3] Shanghai Engn Res Ctr Internet Things Resp Med, Shanghai 200032, Peoples R China

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2024年 / 133卷

基金：

中国国家自然科学基金;

关键词：

Dynamic facial expression recognition; Spatial feature fusion; Graph convolution network; Self-attention;

D O I：

10.1016/j.engappai.2024.108535

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dynamic facial expression recognition (DFER) is of great significance in promoting empathetic machines and metaverse technology. However, dynamic facial expression recognition (DFER) in the wild remains a challenging task, often constrained by complex lighting changes, frequent key-points occlusion, uncertain emotional peaks and severe imbalanced dataset categories. To tackle these problems, this paper presents a depth neural network model based on spatial key-points optimized region feature fusion and temporal self- attention. The method includes three parts: spatial feature extraction module, temporal feature extraction module and region feature fusion module. The intra-frame spatial feature extraction module is composed of the key-points graph convolution network (GCN) and a convolution network (CNN) branch to obtain the global and local feature vectors. The newly proposed region fusion strategy based on face spatial structure is used to obtain the spatial fusion feature of each frame. The inter-frame temporal feature extraction module uses multi-head self-attention model to obtain the temporal information of inter-frames. The experimental results show that our method achieves accuracy of 68.73%, 55.00%, 47.80%, and 47.44% on the DFEW, AFEW, FERV39k, and MAFW datasets. Ablation experiments showed that the GCN module, fusion module, and temporal module improved the accuracy on DFEW by 0.68%, 1.66%, and 3.25%, respectively. The method also achieves competitive results in terms of parameter quantity and inference speed, which demonstrates the effectiveness of the proposed method.

引用

页数：12

共 31 条

[21] Phase Space Reconstruction Driven Spatio-Temporal Feature Learning for Dynamic Facial Expression Recognition
Wang, Shanmin
Shuai, Hui
Liu, Qingshan
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1466 - 1476
[22] PerceptGuide: A Perception Driven Assistive Mobility Aid Based on Self-Attention and Multi-Scale Feature Fusion
Madake, Jyoti
Bhatlawande, Shripad
Solanke, Anjali
Shilaskar, Swati
IEEE ACCESS, 2023, 11 : 101167 - 101182
[23] An eXplainable Self-Attention-Based Spatial–Temporal Analysis for Human Activity Recognition
Meena, Tanushree
Sarawadekar, Kishor
IEEE SENSORS JOURNAL, 2024, 24 (01) : 635 - 644
[24] A Robust Facial Expression Recognition Algorithm Based on Multi-Rate Feature Fusion Scheme
Park, Seo-Jeon
Kim, Byung-Gyu
Chilamkurti, Naveen
SENSORS, 2021, 21 (21)
[25] Gabor Log-Euclidean Gaussian and its fusion with deep network based on self-attention for face recognition
Li, Chaorong
Huang, Wei
Huang, Yuanyuan
APPLIED SOFT COMPUTING, 2022, 116
[26] Multiscale Temporal Self-Attention and Dynamical Graph Convolution Hybrid Network for EEG-Based Stereogram Recognition
Shen, Lili
Sun, Mingyang
Li, Qunxia
Li, Beichen
Pan, Zhaoqing
Lei, Jianjun
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2022, 30 : 1191 - 1202
[27] Transforming spatio-temporal self-attention using action embedding for skeleton-based action recognition
Ahmad, Tasweer
Rizvi, Syed Tahir Hussain
Kanwal, Neel
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
[28] Cross-Modality Self-Attention and Fusion-Based Neural Network for Lower Limb Locomotion Mode Recognition
Zhao, Changchen
Liu, Kai
Zheng, Hao
Song, Wenbo
Pei, Zhongcai
Chen, Weihai
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 5411 - 5424
[29] Dual-STI: Dual-path spatial-temporal interaction learning for dynamic facial expression recognition
Li, Min
Zhang, Xiaoqin
Fan, Chenxiang
Liao, Tangfei
Xiao, Guobao
INFORMATION SCIENCES, 2024, 678
[30] New approach to diagnose faults in aero-pipelines based on spatial-temporal model fused with self-attention mechanism
Yang, Tongguang
Yuan, Shengyou
Zhou, Xianwen
Han, Qingkai
Yu, Xiaoguang
Zhendong yu Chongji/Journal of Vibration and Shock, 2024, 43 (10): : 299 - 310

← 1 2 3 4 →