CycleGAN-based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition

被引：53

作者：

Bao, Fang ^{[1
]}

Neumann, Michael ^{[1
]}

Ngoc Thang Vu ^{[1
]}

机构：

[1] Univ Stuttgart, Inst Nat Language Proc IMS, Stuttgart, Germany

来源：

INTERSPEECH 2019 | 2019年

关键词：

speech emotion recognition; data augmentation; cycle-consistent generative adversarial networks; CONVERSION;

D O I：

10.21437/Interspeech.2019-2293

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Cycle consistent adversarial networks (CycleGAN) have shown great success in image style transfer with unpaired datasets. Inspired by this, we investigate emotion style transfer to generate synthetic data, which aims at addressing the data scarcity problem in speech emotion recognition. Specifically, we propose a CycleGAN-based method to transfer feature vectors extracted from a large unlabeled speech corpus into synthetic features representing the given target emotions. We extend the CycleGAN framework with a classification loss which improves the discriminability of the generated data. To show the effectiveness of the proposed method, we present results for speech emotion recognition using the generated feature vectors as (i) augmentation of the training data, and (ii) as standalone training set. Our experimental results reveal that when utilizing synthetic feature vectors, the classification performance improves in within-corpus and cross-corpus evaluation.

引用

页码：2828 / 2832

页数：5

共 27 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

[Anonymous], 2017, IEEE T AFFECTIVE COM, DOI DOI 10.1038/S41467-017-01829-1

[3]

[Anonymous], 2017, IEEE C COMP VIS PATT

[4] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[5]

Chang J, 2017, INT CONF ACOUST SPEE, P2746, DOI 10.1109/ICASSP.2017.7952656

[6] Survey on speech emotion recognition: Features, classification schemes, and databases [J].

El Ayadi, Moataz ;

Kamel, Mohamed S. ;

Karray, Fakhri .

PATTERN RECOGNITION, 2011, 44 (03) :572-587

[7]

Eyben F., 2013, P 21 ACM INT C MULT, P835

[8]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

[9]

Han J, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P6822, DOI 10.1109/ICASSP.2018.8462579

[10] Data-driven emotion conversion in spoken English [J].

Inanoglu, Zeynep ;

Young, Steve .

SPEECH COMMUNICATION, 2009, 51 (03) :268-283

← 1 2 3 →