Video Expression Recognition Method Based on Facial Motion Unit and Temporal Attention

被引:0
|
作者
Hu M. [1 ]
Hu P. [1 ]
Ge P. [1 ]
Wang X. [1 ]
Zhang K. [1 ]
Ren F. [1 ,2 ]
机构
[1] Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, School of Computer and Information, Hefei University of Technology, Hefei
[2] Graduate School of Advanced Technology & Science, University of Tokushima, Tokushima
关键词
facial action unit; region of interest segmentation; tag correction; temporal attention;
D O I
10.3724/SP.J.1089.2023.19284
中图分类号
学科分类号
摘要
A video expression recognition method based on facial motion units and temporal attention is proposed to address the problem of inconsistent expression intensity in video sequences, which is difficult to extract features effectively by a long short-term memory network (LSTM). Firstly, we introduce a temporal attention module based on convolutional LSTM (ConvLSTM) to model the video sequences temporally, which can reduce the dimensionality while retaining the rich feature information of face images. Secondly, we propose a face image segmentation rule based on facial motion units to solve the problem that it is difficult to define the active regions of facial expressions. Finally, we embed a label correction module in the model to solve the problem of sample uncertainty in the data set under natural conditions. The experimental results on MMI, Oulu-CASIA and AFEW datasets show that the number of model parameters of this method is lower than that of the published mainstream models, and the average recognition accuracy on the MMI dataset is 87.22%, which is higher than that of the current mainstream methods, and the overall effect is better than that of the current representative methods. © 2023 Institute of Computing Technology. All rights reserved.
引用
收藏
页码:108 / 117
页数:9
相关论文
共 31 条
  • [1] Calvo R A, D'mello S., Affect detection: an interdisciplinary review of models, methods, and their applications, IEEE Transactions on Affective Computing, 1, 1, pp. 18-37, (2010)
  • [2] Liang D D, Liang H G, Yu Z B, Et al., Deep convolutional BiLSTM fusion network for facial expression recognition, The Visual Computer, 36, 3, pp. 499-508, (2020)
  • [3] Huang K, Li J J, Cheng S C, Et al., An efficient algorithm of facial expression recognition by TSG-RNN network, Proceedings of the International Conference on Multimedia Modeling, pp. 161-174, (2020)
  • [4] Zhou J Z, Zhang X M, Liu Y, Et al., Facial expression recognition using spatial-temporal semantic graph network, Proceedings of the IEEE International Conference on Image Processing, pp. 1961-1965, (2020)
  • [5] Zhao X Y, Liang X D, Liu L Q, Et al., Peak-piloted deep network for facial expression recognition, Proceedings of the European Conference on Computer Vision, pp. 425-442, (2016)
  • [6] Kim D H, Baddar W J, Jang J, Et al., Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition, IEEE Transactions on Affective Computing, 10, 2, pp. 223-236, (2019)
  • [7] Meng D B, Peng X J, Wang K, Et al., Frame attention networks for facial expression recognition in videos, Proceedings of the IEEE International Conference on Image Processing, pp. 3866-3870, (2019)
  • [8] Zhou H S, Meng D B, Zhang Y Y, Et al., Exploring emotion features and fusion strategies for audio-video emotion recognition, Proceedings of the International Conference on Multimodal Interaction, pp. 562-566, (2019)
  • [9] Chen W C, Zhang D, Li M, Et al., STCAM: spatial-temporal and channel attention module for dynamic facial expression recognition, IEEE Transactions on Affective Computing
  • [10] Shi X J, Chen Z R, Wang H, Et al., Convolutional LSTM network: a machine learning approach for precipitation nowcasting, Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 802-810, (2015)