Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks

被引:5
作者
Teixeira, Thomas [1 ]
Granger, Eric [1 ]
Lameiras Koerich, Alessandro [1 ]
机构
[1] Univ Quebec, Ecole Technol Super, 1100 Rue Notre Dame Ouest, Montreal, PQ H3C 1K3, Canada
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 24期
基金
加拿大自然科学与工程研究理事会;
关键词
facial expression recognition; deep learning; convolutional recurrent neural networks; inflated 3D CNNs; dimensional emotion representation; long short-term memory; FACIAL EXPRESSIONS; DEEP; FEATURES; IMAGE; FACE;
D O I
10.3390/app112411738
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Facial expressions are one of the most powerful ways to depict specific patterns in human behavior and describe the human emotional state. However, despite the impressive advances of affective computing over the last decade, automatic video-based systems for facial expression recognition still cannot correctly handle variations in facial expression among individuals as well as cross-cultural and demographic aspects. Nevertheless, recognizing facial expressions is a difficult task, even for humans. This paper investigates the suitability of state-of-the-art deep learning architectures based on convolutional neural networks (CNNs) to deal with long video sequences captured in the wild for continuous emotion recognition. For such an aim, several 2D CNN models that were designed to model spatial information are extended to allow spatiotemporal representation learning from videos, considering a complex and multi-dimensional emotion space, where continuous values of valence and arousal must be predicted. We have developed and evaluated convolutional recurrent neural networks, combining 2D CNNs and long short term-memory units and inflated 3D CNN models, which are built by inflating the weights of a pre-trained 2D CNN model during fine-tuning, using application-specific videos. Experimental results on the challenging SEWA-DB dataset have shown that these architectures can effectively be fine-tuned to encode spatiotemporal information from successive raw pixel images and achieve state-of-the-art results on such a dataset.
引用
收藏
页数:21
相关论文
共 75 条
[41]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[42]   A Deeper Look at Facial Expression Dataset Bias [J].
Li, Shan ;
Deng, Weihong .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (02) :881-893
[43]   Reliable Crowdsourcing and Deep Locality-Preserving Learning for Unconstrained Facial Expression Recognition [J].
Li, Shan ;
Deng, Weihong .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (01) :356-370
[44]   Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild [J].
Li, Shan ;
Deng, Weihong ;
Du, JunPing .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2584-2593
[45]   Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism [J].
Li, Yong ;
Zeng, Jiabei ;
Shan, Shiguang ;
Chen, Xilin .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (05) :2439-2450
[46]  
Liu K, 2018, AAAI CONF ARTIF INTE, P7138
[47]   Adaptive Deep Metric Learning for Identity-Aware Facial Expression Recognition [J].
Liu, Xiaofeng ;
Kumar, B. V. K. Vijaya ;
You, Jane ;
Jia, Ping .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :522-531
[48]  
Mou W., 2015, IEEE INT CONF AUTOMA, V5, P1
[49]   Deep spatio-temporal features for multimodal emotion recognition [J].
Nguyen, Dung ;
Nguyen, Kien ;
Sridharan, Sridha ;
Ghasemi, Afsane ;
Dean, David ;
Fookes, Clinton .
2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017), 2017, :1215-1223
[50]   2D Principal Component Analysis for Face and Facial-Expression Recognition [J].
Oliveira, Luiz S. ;
Koerich, Alessandro L. ;
Mansano, Marcelo ;
Britto, Alceu S., Jr. .
COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (03) :9-13