Dynamic facial expression recognition based on spatial key-points optimized region feature fusion and temporal self-attention

被引:1
作者
Huang, Zhiwei [1 ]
Zhu, Yu [1 ]
Li, Hangyu [1 ]
Yang, Dawei [2 ,3 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
[2] Fudan Univ, Zhongshan Hosp, Dept Pulm & Crit Care Med, Shanghai 200032, Peoples R China
[3] Shanghai Engn Res Ctr Internet Things Resp Med, Shanghai 200032, Peoples R China
基金
中国国家自然科学基金;
关键词
Dynamic facial expression recognition; Spatial feature fusion; Graph convolution network; Self-attention;
D O I
10.1016/j.engappai.2024.108535
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic facial expression recognition (DFER) is of great significance in promoting empathetic machines and metaverse technology. However, dynamic facial expression recognition (DFER) in the wild remains a challenging task, often constrained by complex lighting changes, frequent key-points occlusion, uncertain emotional peaks and severe imbalanced dataset categories. To tackle these problems, this paper presents a depth neural network model based on spatial key-points optimized region feature fusion and temporal self- attention. The method includes three parts: spatial feature extraction module, temporal feature extraction module and region feature fusion module. The intra-frame spatial feature extraction module is composed of the key-points graph convolution network (GCN) and a convolution network (CNN) branch to obtain the global and local feature vectors. The newly proposed region fusion strategy based on face spatial structure is used to obtain the spatial fusion feature of each frame. The inter-frame temporal feature extraction module uses multi-head self-attention model to obtain the temporal information of inter-frames. The experimental results show that our method achieves accuracy of 68.73%, 55.00%, 47.80%, and 47.44% on the DFEW, AFEW, FERV39k, and MAFW datasets. Ablation experiments showed that the GCN module, fusion module, and temporal module improved the accuracy on DFEW by 0.68%, 1.66%, and 3.25%, respectively. The method also achieves competitive results in terms of parameter quantity and inference speed, which demonstrates the effectiveness of the proposed method.
引用
收藏
页数:12
相关论文
共 31 条
  • [21] Phase Space Reconstruction Driven Spatio-Temporal Feature Learning for Dynamic Facial Expression Recognition
    Wang, Shanmin
    Shuai, Hui
    Liu, Qingshan
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1466 - 1476
  • [22] PerceptGuide: A Perception Driven Assistive Mobility Aid Based on Self-Attention and Multi-Scale Feature Fusion
    Madake, Jyoti
    Bhatlawande, Shripad
    Solanke, Anjali
    Shilaskar, Swati
    IEEE ACCESS, 2023, 11 : 101167 - 101182
  • [23] An eXplainable Self-Attention-Based Spatial–Temporal Analysis for Human Activity Recognition
    Meena, Tanushree
    Sarawadekar, Kishor
    IEEE SENSORS JOURNAL, 2024, 24 (01) : 635 - 644
  • [24] A Robust Facial Expression Recognition Algorithm Based on Multi-Rate Feature Fusion Scheme
    Park, Seo-Jeon
    Kim, Byung-Gyu
    Chilamkurti, Naveen
    SENSORS, 2021, 21 (21)
  • [25] Gabor Log-Euclidean Gaussian and its fusion with deep network based on self-attention for face recognition
    Li, Chaorong
    Huang, Wei
    Huang, Yuanyuan
    APPLIED SOFT COMPUTING, 2022, 116
  • [26] Multiscale Temporal Self-Attention and Dynamical Graph Convolution Hybrid Network for EEG-Based Stereogram Recognition
    Shen, Lili
    Sun, Mingyang
    Li, Qunxia
    Li, Beichen
    Pan, Zhaoqing
    Lei, Jianjun
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2022, 30 : 1191 - 1202
  • [27] Transforming spatio-temporal self-attention using action embedding for skeleton-based action recognition
    Ahmad, Tasweer
    Rizvi, Syed Tahir Hussain
    Kanwal, Neel
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
  • [28] Cross-Modality Self-Attention and Fusion-Based Neural Network for Lower Limb Locomotion Mode Recognition
    Zhao, Changchen
    Liu, Kai
    Zheng, Hao
    Song, Wenbo
    Pei, Zhongcai
    Chen, Weihai
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 5411 - 5424
  • [29] Dual-STI: Dual-path spatial-temporal interaction learning for dynamic facial expression recognition
    Li, Min
    Zhang, Xiaoqin
    Fan, Chenxiang
    Liao, Tangfei
    Xiao, Guobao
    INFORMATION SCIENCES, 2024, 678
  • [30] New approach to diagnose faults in aero-pipelines based on spatial-temporal model fused with self-attention mechanism
    Yang, Tongguang
    Yuan, Shengyou
    Zhou, Xianwen
    Han, Qingkai
    Yu, Xiaoguang
    Zhendong yu Chongji/Journal of Vibration and Shock, 2024, 43 (10): : 299 - 310