Research on speech emotion recognition algorithm for unbalanced data set

被引:0
作者
Liang Z. [1 ]
Li X. [1 ]
Song W. [1 ]
机构
[1] Electronic Information Engineering, Changchun University of Science and Technology, Jilin Province
关键词
CRNN; focal loss; spectrograms; Speech emotion recognition;
D O I
10.3233/JIFS-191129
中图分类号
学科分类号
摘要
In speech emotion recognition, most emotional corpora generally have problems such as inconsistent sample length and imbalance of sample categories. Considering these problems, in this paper, a variable length input CRNN deep learning model based on Focal Loss is proposed for speech emotion recognition of anger, happiness, neutrality and sadness in IEMOCAP emotional corpus. In this model, Firstly, a variable-length strategy is introduced to input the speech spectra of the filled speech samples into CNN. Then the effective part of the input sequence is preserved and output by masking matrix and convolution layer. Thirdly, the effective output of input sequence is input into BiGRU network for learning. Finally, the focal loss is used for network training to control and adjust the contribution of various samples to the total loss. Compared with the traditional speech emotion recognition model, simulations show that our method can effectively improve the accuracy and performance of emotion recognition. © 2020 - IOS Press and the authors. All rights reserved.
引用
收藏
页码:2791 / 2796
页数:5
相关论文
共 50 条
[41]   Windowing for Speech Emotion Recognition [J].
Puterka, Boris ;
Kacur, Juraj ;
Pavlovicova, Jarmila .
2019 61ST INTERNATIONAL SYMPOSIUM ELMAR, 2019, :147-150
[42]   Review on speech emotion recognition [J].
Han, Wen-Jing ;
Li, Hai-Feng ;
Ruan, Hua-Bin ;
Ma, Lin .
Ruan Jian Xue Bao/Journal of Software, 2014, 25 (01) :37-50
[43]   Research on Speech Emotion Recognition Based on AA-CBGRU Network [J].
Yan, Yu ;
Shen, Xizhong .
ELECTRONICS, 2022, 11 (09)
[44]   Training universal background models with restricted data for speech emotion recognition [J].
Trabelsi, Imen ;
Perotto, Filipo Studzinski ;
Malik, Usman .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 13 (10) :4787-4797
[45]   A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION [J].
Braunschweiler, Norbert ;
Doddipatla, Rama ;
Keizer, Simon ;
Stoyanchev, Svetlana .
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, :24-30
[46]   Training universal background models with restricted data for speech emotion recognition [J].
Imen Trabelsi ;
Filipo Studzinski Perotto ;
Usman Malik .
Journal of Ambient Intelligence and Humanized Computing, 2022, 13 :4787-4797
[47]   Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation [J].
Tao, Huawei ;
Shan, Shuai ;
Hu, Ziyi ;
Zhu, Chunhua ;
Ge, Hongyi .
ENTROPY, 2023, 25 (01)
[48]   Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition [J].
Wang, Shijun ;
Hemati, Hamed ;
Gudnason, Jon ;
Borth, Damian .
INTERSPEECH 2022, 2022, :391-395
[49]   TOWARDS IMPROVING SPEECH EMOTION RECOGNITION USING SYNTHETIC DATA AUGMENTATION FROM EMOTION CONVERSION [J].
Ibrahim, Karim M. ;
Perzol, Antony ;
Leglaive, Simon .
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, :10636-10640
[50]   Speech emotion recognition model based on Bi-GRU and Focal Loss [J].
Zhu, Zijiang ;
Dai, Weihuang ;
Hu, Yi ;
Li, Junshan .
PATTERN RECOGNITION LETTERS, 2020, 140 :358-365