Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks

被引：5

作者：

Teixeira, Thomas ^{[1
]}

Granger, Eric ^{[1
]}

Lameiras Koerich, Alessandro ^{[1
]}

机构：

[1] Univ Quebec, Ecole Technol Super, 1100 Rue Notre Dame Ouest, Montreal, PQ H3C 1K3, Canada

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 24期

基金：

加拿大自然科学与工程研究理事会;

关键词：

facial expression recognition; deep learning; convolutional recurrent neural networks; inflated 3D CNNs; dimensional emotion representation; long short-term memory; FACIAL EXPRESSIONS; DEEP; FEATURES; IMAGE; FACE;

D O I：

10.3390/app112411738

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Facial expressions are one of the most powerful ways to depict specific patterns in human behavior and describe the human emotional state. However, despite the impressive advances of affective computing over the last decade, automatic video-based systems for facial expression recognition still cannot correctly handle variations in facial expression among individuals as well as cross-cultural and demographic aspects. Nevertheless, recognizing facial expressions is a difficult task, even for humans. This paper investigates the suitability of state-of-the-art deep learning architectures based on convolutional neural networks (CNNs) to deal with long video sequences captured in the wild for continuous emotion recognition. For such an aim, several 2D CNN models that were designed to model spatial information are extended to allow spatiotemporal representation learning from videos, considering a complex and multi-dimensional emotion space, where continuous values of valence and arousal must be predicted. We have developed and evaluated convolutional recurrent neural networks, combining 2D CNNs and long short term-memory units and inflated 3D CNN models, which are built by inflating the weights of a pre-trained 2D CNN model during fine-tuning, using application-specific videos. Experimental results on the challenging SEWA-DB dataset have shown that these architectures can effectively be fine-tuned to encode spatiotemporal information from successive raw pixel images and achieve state-of-the-art results on such a dataset.

引用

页数：21

共 75 条

[1] Using Synthetic Data to Improve Facial Expression Analysis with 3D Convolutional Networks [J].

Abbasnejad, Iman ;

Sridharan, Sridha ;

Dung Nguyen ;

Denman, Simon ;

Fookes, Clinton ;

Lucey, Simon .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, :1609-1618

[2]

[Anonymous], 2018, ARXIV PREPRINT ARXIV

[3]

[Anonymous], 2015, P 5 INT WORKSH AUD V, DOI DOI 10.1145/2808196

[4]

[Anonymous], 2015, PROC 5 INT WORKSHOP

[5] Temporal Stochastic Softmax for 3D CNNs: An Application in Facial Expression Recognition [J].

Ayral, Theo ;

Pedersoli, Marco ;

Bacon, Simon ;

Granger, Eric .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :3028-3037

[6] Emotion Recognition in the Wild from Videos using Images [J].

Bargal, Sarah Adel ;

Barsoum, Emad ;

Ferrer, Cristian Canton ;

Zhang, Cha .

ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :433-436

[7] Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements [J].

Barrett, Lisa Feldman ;

Adolphs, Ralph ;

Marsella, Stacy ;

Martinez, Aleix M. ;

Pollak, Seth D. .

PSYCHOLOGICAL SCIENCE IN THE PUBLIC INTEREST, 2019, 20 (01) :1-68

[8] Developing crossmodal expression recognition based on a deep neural model [J].

Barros, Pablo ;

Wermter, Stefan .

ADAPTIVE BEHAVIOR, 2016, 24 (05) :373-396

[9]

Ben Henia WM, 2017, 2017 INTERNATIONAL CONFERENCE ON ENGINEERING & MIS (ICEMIS)

[10]

Campos V, 2015, P 1 INT WORKSH AFF S, P57

← 1 2 3 4 5 6 7 8 →