Speech emotion recognition in Persian based on stacked autoencoder by comparing local and global features

被引:0
作者
Azam Bastanfard
Alireza Abbasian
机构
[1] Islamic Azad University,Department of Computer Engineering, Karaj Branch
[2] Islamic Republic of Iran Broadcasting University,Faculty of Media Engineering
来源
Multimedia Tools and Applications | 2023年 / 82卷
关键词
Speech emotion recognition; Stacked autoencoder; Persian language; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Among the barriers to establishing effective human-machine interactions is the machines’ inability to properly distinguish emotions from the human voice. The Speech Emotion Recognition (SER) systems have emerged to tackle this limitation. The accuracy of these systems depends on different factors such as the quantity and the types of emotions included in the database, feature extraction process including local and global features, feature selection method, and the type of classifier. This study presents a methodology for speech emotion recognition using an autoencoder neural network. It is shown that using a digit-level stacked autoencoder can be suitable for digit classification. The speech emotion recognition is done using the Persian emotional speech database (Persian ESD), which includes six emotional states: Happiness, Sadness, Fear, Disgust, Anger, and Neutral. Moreover, the popular, widely-used Berlin Emotional database (EMO-DB) is used to evaluate the effectiveness of the proposed approach. The experimental results show that the proposed method has significantly improved recognition accuracy.
引用
收藏
页码:36413 / 36430
页数:17
相关论文
共 103 条
[1]  
Akçay MB(2020)Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers Speech Commun 116 56-76
[2]  
Oğuz K(2011)Spoken emotion recognition using hierarchical classifiers Comput Speech Lang 25 556-570
[3]  
Albornoz EM(2018)Robust emotional speech recognition based on binaural model and emotional auditory mask in noisy environments EURASIP J Audio Speech Music Process 2018 1-13
[4]  
Milone DH(2010)Class-level spectral features for emotion recognition Speech Commun 52 613-625
[5]  
Rufiner HL(2020)Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm Multimed Tools Appl 79 1261-1289
[6]  
Bashirpour M(2020)Speech emotion recognition UsingConvolutional neural network and long-short TermMemory Multimedia Tools and Applications 79 32917-32934
[7]  
Geravanchizadeh M(2017)Universum autoencoder-based domain adaptation for speech emotion recognition IEEE Signal Process Lett 24 500-504
[8]  
Bitouk D(2011)Survey on speech emotion recognition: features, classification schemes, and databases Pattern Recogn 44 572-587
[9]  
Verma R(2014)A database for automatic persian speech emotion recognition: collection, processing and evaluation Int J Eng 27 79-90
[10]  
Nenkova A(2013)Speech emotion recognition by using combinations of C5. 0, neural network (NN), and support vector machines (SVM) classification methods J Math Comput Sci 6 191-200