Effects of Data Augmentations on Speech Emotion Recognition

被引:17
作者
Atmaja, Bagus Tris [1 ,2 ]
Sasou, Akira [1 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Tsukuba, Ibaraki 3058560, Japan
[2] Inst Teknol Sepuluh Nopember, Surabaya 60111, Indonesia
关键词
speech emotion recognition; affective computing; data augmentations; wav2vec; 2; 0; SVM; FEATURES;
D O I
10.3390/s22165941
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Data augmentation techniques have recently gained more adoption in speech processing, including speech emotion recognition. Although more data tend to be more effective, there may be a trade-off in which more data will not provide a better model. This paper reports experiments on investigating the effects of data augmentation in speech emotion recognition. The investigation aims at finding the most useful type of data augmentation and the number of data augmentations for speech emotion recognition in various conditions. The experiments are conducted on the Japanese Twitter-based emotional speech and IEMOCAP datasets. The results show that for speaker-independent data, two data augmentations with glottal source extraction and silence removal exhibited the best performance among others, even with more data augmentation techniques. For the text-independent data (including speaker and text-independent), more data augmentations tend to improve speech emotion recognition performances. The results highlight the trade-off between the number of data augmentations and the performance of speech emotion recognition showing the necessity to choose a proper data augmentation technique for a specific condition.
引用
收藏
页数:14
相关论文
共 43 条
[1]   Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].
Akcay, Mehmet Berkehan ;
Oguz, Kaya .
SPEECH COMMUNICATION, 2020, 116 :56-76
[2]  
[Anonymous], ECHOTHIEF IMPULSE RE
[3]  
Atmaja B. T., 2021, Journal of Physics: Conference Series, V1896, DOI 10.1088/1742-6596/1896/1/012004
[4]   Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion [J].
Atmaja, Bagus Tris ;
Sasou, Akira ;
Akagi, Masato .
SPEECH COMMUNICATION, 2022, 140 :11-28
[5]   Effect of different splitting criteria on the performance of speech emotion recognition [J].
Atmaja, Bagus Tris ;
Sasou, Akira .
2021 IEEE REGION 10 CONFERENCE (TENCON 2021), 2021, :760-764
[6]  
Atmaja BT, 2020, ASIAPAC SIGN INFO PR, P325
[7]  
Atmaja BT, 2019, 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), P40, DOI [10.1109/icsigsys.2019.8811080, 10.1109/ICSIGSYS.2019.8811080]
[8]  
Baevski Alexei, 2020, Advances in neural information processing systems
[9]   MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception [J].
Busso, Carlos ;
Parthasarathy, Srinivas ;
Burmania, Alec ;
AbdelWahab, Mohammed ;
Sadoughi, Najmeh ;
Provost, Emily Mower .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2017, 8 (01) :67-80
[10]  
Busso C, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P1670