ISLA: Temporal Segmentation and Labeling for Audio-Visual Emotion Recognition

被引：27

作者：

Kim, Yelin ^{[1
]}

Provost, Emily Mower ^{[2
]}

机构：

[1] SUNY Albany, Dept Elect & Comp Engn, Albany, NY 12206 USA

[2] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA

来源：

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING | 2019年 / 10卷 / 02期

关键词：

Audio-visual; emotion; recognition; multimodal; temporal; face region; speech; FACIAL EXPRESSION; SPEECH; CLASSIFICATION; MODALITIES; MOVEMENT; PROSODY; AREAS;

D O I：

10.1109/TAFFC.2017.2702653

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Emotion is an essential part of human interaction. Automatic emotion recognition can greatly benefit human-centered interactive technology, since extracted emotion can be used to understand and respond to user needs. However, real-world emotion recognition faces a central challenge when a user is speaking: facial movements due to speech are often confused with facial movements related to emotion. Recent studies have found that the use of phonetic information can reduce speech-related variability in the lower face region. However, methods to differentiate upper face movements due to emotion and due to speech have been underexplored. This gap leads us to the proposal of the Informed Segmentation and Labeling Approach (ISLA). ISLA uses speech signals that alter the dynamics of the lower and upper face regions. We demonstrate how pitch can be used to improve estimates of emotion from the upper face, and how this estimate can be combined with emotion estimates from the lower face and speech in a multimodal classification system. Our emotion classification results on the IEMOCAP and SAVEE datasets show that ISLA improves overall classification performance. We also demonstrate how emotion estimates from different modalities correlate with each other, providing insights into the differences between posed and spontaneous expressions.

引用

页码：196 / 208

页数：13

共 58 条

[1] Deep Multimodal Fusion: A Hybrid Approach [J].

Amer, Mohamed R. ;

Shields, Timothy ;

Siddiquie, Behjat ;

Tamrakar, Amir ;

Divakaran, Ajay ;

Chai, Sek .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018, 126 (2-4) :440-456

[2]

[Anonymous], 2011, P WORKSH NEW TOOLS M

[3]

[Anonymous], HDB CORPUS PHONOLOGY

[4]

[Anonymous], 2014, DEPRESSION

[5]

[Anonymous], 2013, INTRO BYZANTINE AEST

[6]

[Anonymous], 2010, Multimodal Emotion Recognition, DOI DOI 10.4018/978-1-61520-919-4

[7]

[Anonymous], P INT

[8]

[Anonymous], INTERSPEECH 2013 COM

[9] Anchor Models for Emotion Recognition from Speech [J].

Attabi, Yazid ;

Dumouchel, Pierre .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2013, 4 (03) :280-290

[10] EMOTION RECOGNITION - ROLE OF FACIAL MOVEMENT AND THE RELATIVE IMPORTANCE OF UPPER AND LOWER AREAS OF THE FACE [J].

BASSILI, JN .

JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1979, 37 (11) :2049-2058

← 1 2 3 4 5 6 →