Dynamic facial expression recognition based on spatial key-points optimized region feature fusion and temporal self-attention

被引:1
作者
Huang, Zhiwei [1 ]
Zhu, Yu [1 ]
Li, Hangyu [1 ]
Yang, Dawei [2 ,3 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
[2] Fudan Univ, Zhongshan Hosp, Dept Pulm & Crit Care Med, Shanghai 200032, Peoples R China
[3] Shanghai Engn Res Ctr Internet Things Resp Med, Shanghai 200032, Peoples R China
基金
中国国家自然科学基金;
关键词
Dynamic facial expression recognition; Spatial feature fusion; Graph convolution network; Self-attention;
D O I
10.1016/j.engappai.2024.108535
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic facial expression recognition (DFER) is of great significance in promoting empathetic machines and metaverse technology. However, dynamic facial expression recognition (DFER) in the wild remains a challenging task, often constrained by complex lighting changes, frequent key-points occlusion, uncertain emotional peaks and severe imbalanced dataset categories. To tackle these problems, this paper presents a depth neural network model based on spatial key-points optimized region feature fusion and temporal self- attention. The method includes three parts: spatial feature extraction module, temporal feature extraction module and region feature fusion module. The intra-frame spatial feature extraction module is composed of the key-points graph convolution network (GCN) and a convolution network (CNN) branch to obtain the global and local feature vectors. The newly proposed region fusion strategy based on face spatial structure is used to obtain the spatial fusion feature of each frame. The inter-frame temporal feature extraction module uses multi-head self-attention model to obtain the temporal information of inter-frames. The experimental results show that our method achieves accuracy of 68.73%, 55.00%, 47.80%, and 47.44% on the DFEW, AFEW, FERV39k, and MAFW datasets. Ablation experiments showed that the GCN module, fusion module, and temporal module improved the accuracy on DFEW by 0.68%, 1.66%, and 3.25%, respectively. The method also achieves competitive results in terms of parameter quantity and inference speed, which demonstrates the effectiveness of the proposed method.
引用
收藏
页数:12
相关论文
共 31 条
  • [1] Perceiving informative key-points: A self-attention approach for person search
    Gao, Guangyu
    Han, Cen
    Liu, Zhen
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 101
  • [2] STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition
    Chen, Weicong
    Zhang, Dong
    Li, Ming
    Lee, Dah-Jye
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) : 800 - 810
  • [3] A framework for facial expression recognition using deep self-attention network
    Indolia S.
    Nigam S.
    Singh R.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (07) : 9543 - 9562
  • [4] Singing Melody Extraction Based on Combined Frequency-Temporal Attention and Attentional Feature Fusion with Self-Attention
    Qi, Xi
    Tian, Lihua
    Li, Chen
    Song, Hui
    Yan, Jiahui
    2022 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2022, : 220 - 227
  • [5] Local Multi-Head Channel Self-Attention for Facial Expression Recognition
    Pecoraro, Roberto
    Basile, Valerio
    Bono, Viviana
    INFORMATION, 2022, 13 (09)
  • [6] A self-attention-based fusion framework for facial expression recognition in wavelet domain
    Indolia, Sakshi
    Nigam, Swati
    Singh, Rajiv
    VISUAL COMPUTER, 2024, 40 (09) : 6341 - 6357
  • [7] Dynamic Spatial-Temporal Self-Attention Network for Traffic Flow Prediction
    Wang, Dong
    Yang, Hongji
    Zhou, Hua
    FUTURE INTERNET, 2024, 16 (06)
  • [8] Self-Attention Networks For Motion Posture Recognition Based On Data Fusion
    Ji, Zhihao
    Xie, Qiang
    4TH INTERNATIONAL CONFERENCE ON INFORMATICS ENGINEERING AND INFORMATION SCIENCE (ICIEIS2021), 2022, 12161
  • [9] Parallel Self-Attention and Spatial-Attention Fusion for Human Pose Estimation and Running Movement Recognition
    Wu, Qingtian
    Zhang, Yu
    Zhang, Liming
    Yu, Haoyong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (01) : 358 - 368
  • [10] SlowR50-SA: A Self-Attention Enhanced Dynamic Facial Expression Recognition Model for Tactile Internet Applications
    Neshov, Nikolay
    Christoff, Nicole
    Sechkova, Teodora
    Tonchev, Krasimir
    Manolova, Agata
    ELECTRONICS, 2024, 13 (09)