Dynamic versus Static Facial Expressions in the Presence of Speech

被引：4

作者：

Salman, Ali N. ^{[1
]}

Busso, Carlos ^{[1
]}

机构：

[1] Univ Texas Dallas, Dept Elect & Comp Engn, Multimodal Signal Proc MSP Lab, Richardson, TX 75080 USA

来源：

2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020) | 2020年

关键词：

RECOGNITION;

D O I：

10.1109/FG47880.2020.00119

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Face analysis is an important area in affective computing. While studies have reported important progress in detecting emotions from still images, an open challenge is to determine emotions from videos, leveraging the dynamic nature in the externalization of emotions. A common approach in earlier studies is to individually process each frame of a video, aggregating the results obtained across frames. This study questions this approach, especially when the subjects are speaking. Speech articulation affects the face appearance, which may lead to misleading emotional perceptions when the isolated frames are taken out-of-context. The analysis in this study explores the similarities and differences in emotion perceptions between (1) videos of speaking segments (without audio), and (2) isolated frames from the same videos evaluated out-of-context. We consider the emotions happiness, sadness, anger and neutral state, and emotional attributes valence, arousal, and dominance using the MSP-IMPROV corpus. The results consistently reveal that the emotional perception of static representations of emotion in isolated frames is significantly different from the overall emotional perception of dynamic representation in videos in the presence of speech. The results reveal the intrinsic limitations of the common frame-by-frame analysis of videos, highlighting the importance of explicitly modeling temporal and lexical information in face emotion recognition from videos.

引用

页码：436 / 443

页数：8

共 30 条

[1] Dissociable neural systems for recognizing emotions [J].

Adolphs, R ;

Tranel, D ;

Damasio, AR .

BRAIN AND COGNITION, 2003, 52 (01) :61-69

[2] Deciphering the enigmatic face - The importance of facial dynamics in interpreting subtle facial expressions [J].

Ambadar, Z ;

Schooler, JW ;

Cohn, JF .

PSYCHOLOGICAL SCIENCE, 2005, 16 (05) :403-410

[3]

[Anonymous], 2004, P 10 AUSTR INT C SPE

[4] Sex differences in perception of emotion intensity in dynamic and static facial expressions [J].

Biele, Cezary ;

Grabowska, Anna .

EXPERIMENTAL BRAIN RESEARCH, 2006, 171 (01) :1-6

[5]

Busso C., 2004, P 6 INT C MULT INT, P205, DOI DOI 10.1145/1027933.1027968

[6] Joint analysis of the emotional fingerprint in the face and speech: A single subject study [J].

Busso, Carlos ;

Narayanan, Shrikanth S. .

2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, :43-+

[7] Interrelation between speech and facial gestures in emotional utterances: A single subject study [J].

Busso, Carlos ;

Narayanan, Shrikanth S. .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08) :2331-2347

[8] MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception [J].

Busso, Carlos ;

Parthasarathy, Srinivas ;

Burmania, Alec ;

AbdelWahab, Mohammed ;

Sadoughi, Najmeh ;

Provost, Emily Mower .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2017, 8 (01) :67-80

[9]

Cauldwell R.T., 2000, PROC ISCA WORKSHOP S, P127

[10] Facial expression recognition from video sequences: temporal and static modeling [J].

Cohen, I ;

Sebe, N ;

Garg, A ;

Chen, LS ;

Huang, TS .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2003, 91 (1-2) :160-187

← 1 2 3 →