A hybrid deep feature selection framework for emotion recognition from human speeches

被引:9
作者
Marik, Aritra [1 ]
Chattopadhyay, Soumitri [1 ]
Singh, Pawan Kumar [1 ]
机构
[1] Jadavpur Univ, Dept Informat Technol, Jadavpur Univ Second Campus,Plot 8,LB Block, Kolkata 700106, W Bengal, India
基金
英国科研创新办公室;
关键词
Speech emotion recognition; Deep learning; Feature selection; Fuzzy entropy & similarity measures; Whale optimization algorithm; ALGORITHMS;
D O I
10.1007/s11042-022-14052-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech Emotion Recognition (SER) is an active area of signal processing research that aims at identifying emotional states from audio speech signals. Applications of SER range from psychological diagnosis to human-computer interaction and as such, a robust framework is needed for accurate classification. To this end, we propose a two-stage hybrid deep feature selection (HDFS) framework that combines deep learning with automated feature engineering for emotion recognition from human speeches, which shines both in terms of accuracy and computational efficiency. Our pipeline extracts self-learned features using a customized Wide-ResNet-50-2 deep learning model from mel-pectrograms of raw audio signals, whose dimensionality is reduced using a hybrid deep feature selection algorithm that comprises a fuzzy entropy and similarity-based feature ranking method, followed by Whale optimization algorithm, which is a popular meta-heuristic optimization algorithm in literature. A k-nearest neighbor classifier is used to classify the optimized feature subset into the respective emotion classes. The proposed pipeline is evaluated on three publicly available SER datasets using a 5-fold cross-validation scheme, where it is found to outperform several state-of-the-art existing works in literature by significant margins thus, justifying the superiority and reliability of the proposed research. The source codes of the proposed method can be found at: .
引用
收藏
页码:11461 / 11487
页数:27
相关论文
共 71 条
[1]   Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models [J].
Abbaschian, Babak Joze ;
Sierra-Sosa, Daniel ;
Elmaghraby, Adel .
SENSORS, 2021, 21 (04) :1-27
[2]   The Arithmetic Optimization Algorithm [J].
Abualigah, Laith ;
Diabat, Ali ;
Mirjalili, Seyedali ;
Elaziz, Mohamed Abd ;
Gandomi, Amir H. .
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2021, 376
[3]   Metaheuristic Algorithms on Feature Selection: A Survey of One Decade of Research (2009-2019) [J].
Agrawal, Prachi ;
Abutarboush, Hattan F. ;
Ganesh, Talari ;
Mohamed, Ali Wagdy .
IEEE ACCESS, 2021, 9 :26766-26791
[4]  
Ahmed S, 2021, NEURAL COMPUT APPL, V33, P6467, DOI 10.1007/s00521-020-05409-1
[5]   Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].
Akcay, Mehmet Berkehan ;
Oguz, Kaya .
SPEECH COMMUNICATION, 2020, 116 :56-76
[6]   Spoken emotion recognition using hierarchical classifiers [J].
Albornoz, Enrique M. ;
Milone, Diego H. ;
Rufiner, Hugo L. .
COMPUTER SPEECH AND LANGUAGE, 2011, 25 (03) :556-570
[7]  
Alghowinem S, 2013, INT CONF ACOUST SPEE, P8022, DOI 10.1109/ICASSP.2013.6639227
[8]   AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].
ALTMAN, NS .
AMERICAN STATISTICIAN, 1992, 46 (03) :175-185
[9]   Improved speech emotion recognition with Mel frequency magnitude coefficient [J].
Ancilin, J. ;
Milton, A. .
APPLIED ACOUSTICS, 2021, 179
[10]  
[Anonymous], 2005, Interspeech, DOI DOI 10.21437/INTERSPEECH.2005-446