Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning

被引：37

作者：

Kim, Jaebok ^{[1
]}

Englebienne, Gwenn ^{[1
]}

Truong, Khiet P. ^{[1
]}

Evers, Vanessa ^{[1
]}

机构：

[1] Univ Twente, Human Media Interact, Enschede, Netherlands

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

speech emotion recognition; computational paralinguistics; deep learning;

D O I：

10.21437/Interspeech.2017-736

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the challenges in Speech Emotion Recognition (SER) "in the wild" is the large mismatch between training and test data (e.g. speakers and tasks). In order to improve the generalisation capabilities of the emotion models, we propose to use Multi-Task Learning (MTL) and use gender and naturalness as auxiliary tasks in deep neural networks. This method was evaluated in within-corpus and various cross-corpus classification experiments that simulate conditions "in the wild". In comparison to Single-Task Learning (STL) based state of the art methods, we found that our MTL method proposed improved performance significantly. Particularly, models using both gender and naturalness achieved more gains than those using either gender or naturalness separately. This benefit was also found in the high-level representations of the feature space, obtained from our method proposed, where discriminative emotional clusters could be observed.

引用

页码：1113 / 1117

页数：5

共 33 条

[1]

[Anonymous], 2015, P INTERSPEECH

[2]

[Anonymous], P 10 INT C MACH LEAR

[3]

[Anonymous], P LREC

[4]

[Anonymous], 2015, P INT C LEARN REPR

[5]

[Anonymous], 2007, Multi-Task Feature Learning, DOI DOI 10.7551/MITPRESS/7503.003.0010

[6]

[Anonymous], P ACM C MULT

[7]

[Anonymous], 2020, Nonparametric Statistical Inference, DOI DOI 10.1201/9781439896129

[8]

[Anonymous], 2011, P AF AVIOS SPEECH PR

[9]

[Anonymous], 2014, J MACH LEARN RES

[10]

[Anonymous], 2002, Emotional prosody speech and transcripts'

← 1 2 3 4 →