CycleGAN-based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition

被引:49
作者
Bao, Fang [1 ]
Neumann, Michael [1 ]
Ngoc Thang Vu [1 ]
机构
[1] Univ Stuttgart, Inst Nat Language Proc IMS, Stuttgart, Germany
来源
INTERSPEECH 2019 | 2019年
关键词
speech emotion recognition; data augmentation; cycle-consistent generative adversarial networks; CONVERSION;
D O I
10.21437/Interspeech.2019-2293
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Cycle consistent adversarial networks (CycleGAN) have shown great success in image style transfer with unpaired datasets. Inspired by this, we investigate emotion style transfer to generate synthetic data, which aims at addressing the data scarcity problem in speech emotion recognition. Specifically, we propose a CycleGAN-based method to transfer feature vectors extracted from a large unlabeled speech corpus into synthetic features representing the given target emotions. We extend the CycleGAN framework with a classification loss which improves the discriminability of the generated data. To show the effectiveness of the proposed method, we present results for speech emotion recognition using the generated feature vectors as (i) augmentation of the training data, and (ii) as standalone training set. Our experimental results reveal that when utilizing synthetic feature vectors, the classification performance improves in within-corpus and cross-corpus evaluation.
引用
收藏
页码:2828 / 2832
页数:5
相关论文
共 27 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] [Anonymous], 2017, IEEE T AFFECTIVE COM, DOI DOI 10.1038/S41467-017-01829-1
  • [3] [Anonymous], 2017, IEEE C COMP VIS PATT
  • [4] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [5] Chang J, 2017, INT CONF ACOUST SPEE, P2746, DOI 10.1109/ICASSP.2017.7952656
  • [6] Survey on speech emotion recognition: Features, classification schemes, and databases
    El Ayadi, Moataz
    Kamel, Mohamed S.
    Karray, Fakhri
    [J]. PATTERN RECOGNITION, 2011, 44 (03) : 572 - 587
  • [7] Eyben F., 2013, P 21 ACM INT C MULT, P835
  • [8] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
  • [9] Han J, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P6822, DOI 10.1109/ICASSP.2018.8462579
  • [10] Data-driven emotion conversion in spoken English
    Inanoglu, Zeynep
    Young, Steve
    [J]. SPEECH COMMUNICATION, 2009, 51 (03) : 268 - 283