Data Augmentation using GANs for Speech Emotion Recognition

被引:81
作者
Chatziagapi, Aggelina [1 ]
Paraskevopoulos, Georgios [1 ]
Sgouropoulos, Dimitris [1 ]
Pantazopoulos, Georgios [1 ]
Nikandrou, Malvina [1 ]
Giannakopoulos, Theodoros [1 ]
Katsamanis, Athanasios [1 ]
Potamianos, Alexandros [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Behav Signal Technol Inc, Los Angeles, CA 90027 USA
来源
INTERSPEECH 2019 | 2019年
关键词
Generative Adversarial Networks; Speech Emotion Recognition; data augmentation; data imbalance; NEURAL-NETWORKS;
D O I
10.21437/Interspeech.2019-2561
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this work, we address the problem of data imbalance for the task of Speech Emotion Recognition (SER). We investigate conditioned data augmentation using Generative Adversarial Networks (GANs), in order to generate samples for underrepresented emotions. We adapt and improve a conditional GAN architecture to generate synthetic spectrograms for the minority class. For comparison purposes, we implement a series of signal-based data augmentation methods. The proposed GAN-based approach is evaluated on two datasets, namely IEMOCAP and FEEL-25k, a large multi-domain dataset. Results demonstrate a 10% relative performance improvement in IEMOCAP and 5% in FEEL- 25k, when augmenting the minority classes.
引用
收藏
页码:171 / 175
页数:5
相关论文
共 32 条
[1]  
Aguiar RafaelL., 2018, 2018 INT JOINT C NEU, P1
[2]  
Aldeneh Z, 2017, INT CONF ACOUST SPEE, P2741, DOI 10.1109/ICASSP.2017.7952655
[3]  
Antoniou Antreas, 2018, Data augmentation generative adversarial networks
[4]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[5]   3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition [J].
Chen, Mingyi ;
He, Xuanji ;
Yang, Jing ;
Zhang, Han .
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) :1440-1444
[6]   AutoAugment: Learning Augmentation Strategies from Data [J].
Cubuk, Ekin D. ;
Zoph, Barret ;
Mane, Dandelion ;
Vasudevan, Vijay ;
Le, Quoc V. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :113-123
[7]  
Defferrard M., 2017, 18 INT SOC MUSIC INF
[8]  
Donahue C., 2019, INT C LEARNING REPRE
[9]   Survey on speech emotion recognition: Features, classification schemes, and databases [J].
El Ayadi, Moataz ;
Kamel, Mohamed S. ;
Karray, Fakhri .
PATTERN RECOGNITION, 2011, 44 (03) :572-587
[10]  
Etienne C., 2018, P WORKSH SPEECH MUS, P21, DOI DOI 10.21437/SMM.2018-5