A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space

被引：130

作者：

Xia, Rui ^{[1
]}

Liu, Yang ^{[1
]}

机构：

[1] Univ Texas Dallas, Dept Comp Sci, Richardson, TX 75080 USA

来源：

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING | 2017年 / 8卷 / 01期

关键词：

Categorical emotion recognition; multi-task learning; deep belief network; activation; valence; INFORMATION;

D O I：

10.1109/TAFFC.2015.2512598

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dimensional models have been proposed in psychology studies to represent complex human emotional expressions. Activation and valence are two common dimensions in such models. They can be used to describe certain emotions. For example, anger is one type of emotion with a low valence and high activation value; neutral has both a medium level valence and activation value. In this work, we propose to apply multi-task learning to leverage activation and valence information for acoustic emotion recognition based on the deep belief network (DBN) framework. We treat the categorical emotion recognition task as the major task. For the secondary task, we leverage activation and valence labels in two different ways, category level based classification and continuous level based regression. The combination of the loss functions from the major and secondary tasks is used as the objective function in the multi-task learning framework. After iterative optimization, the values from the last hidden layer in the DBN are used as new features and fed into a support vector machine classifier for emotion recognition. Our experimental results on the Interactive Emotional Dyadic Motion Capture and Sustained Emotionally Colored Machine-Human Interaction Using Nonverbal Expression databases show significant improvements on unweighted accuracy, illustrating the benefit of utilizing additional information in a multi-task learning setup for emotion recognition.

引用

页码：3 / 14

页数：12

共 44 条

[1]

[Anonymous], P 14 ACM INT C MULT

[2]

[Anonymous], 2004, 6 INT C MULTIMODAL I

[3]

[Anonymous], P INTERSPEECH

[4]

[Anonymous], 2006, P 5 LANG RES EV C LR

[5]

[Anonymous], P MACH LEARN

[6]

[Anonymous], 2010, P 9 PYTH SCI COMP C, DOI DOI 10.25080/MAJORA-92BF1922-003

[7]

[Anonymous], P AS PAC SIGN INF PR

[8] Learning Deep Architectures for AI [J].

Bengio, Yoshua .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127

[9]

Boril H, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P2202

[10]

Burkhardt F, 2006, INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, P1053

← 1 2 3 4 5 →