A Robust Deep Transfer Learning Model for Accurate Speech Emotion Classification

被引:2
作者
Akinpelu, Samson [1 ]
Viriri, Serestina [1 ]
机构
[1] Univ KwaZulu Natal, Sch Math Stat & Comp Sci, Durban, South Africa
来源
ADVANCES IN VISUAL COMPUTING, ISVC 2022, PT II | 2022年 / 13599卷
关键词
Deep learning; Speech emotion; Classification; Deep convolutional neural network; FEATURES;
D O I
10.1007/978-3-031-20716-7_33
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The significant role of emotion in human daily interaction cannot be over-emphasized, however, the pressing demand for a cutting-edge and highly efficient model for the classification of speech emotion in effective computing has remained a challenging task. Researchers have proposed several approaches for speech emotion classification (SEC) in recent times, but the lingering challenge of the insufficient dataset, which has been limiting the performances of these approaches, is still of major concern. Therefore, this work proposes a deep transfer learning model, a technique that has been yielding tremendous and state-of-the-art results in computer vision, for SEC. Our approach used a pre-trained and optimized model of Visual Geometry Group (VGGNet) convolutional neural network architecture with appropriate fine-tuning for optimal performance. The speech signal is converted to a mel-Spectrogram image suitable for deep learning model input (224x244 x 3) using filterbanks and Fast Fourier transform (FFT) on the speech samples. Multi-layer perceptron (MLP) algorithm is adopted as a classifier after feature extraction is carried out by the deep learning model. Speech pre-processing was carried out on Toronto English Speech Set (TESS) speech emotional corpus used for the study to prevent the low performance of our proposed model. The result of our experiment after evaluation using the TESS dataset shows an improved result in SEC with an accuracy rate of 96.1% and specificity of 97.4%.
引用
收藏
页码:419 / 430
页数:12
相关论文
共 30 条
[1]  
Retta EA, 2022, Arxiv, DOI arXiv:2201.02710
[2]   Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning [J].
Aggarwal, Apeksha ;
Srivastava, Akshat ;
Agarwal, Ajay ;
Chahal, Nidhi ;
Singh, Dilbag ;
Alnuaim, Abeer Ali ;
Alhadlaq, Aseel ;
Lee, Heung-No .
SENSORS, 2022, 22 (06)
[3]   Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].
Akcay, Mehmet Berkehan ;
Oguz, Kaya .
SPEECH COMMUNICATION, 2020, 116 :56-76
[4]   Review of deep learning: concepts, CNN architectures, challenges, applications, future directions [J].
Alzubaidi, Laith ;
Zhang, Jinglan ;
Humaidi, Amjad J. ;
Al-Dujaili, Ayad ;
Duan, Ye ;
Al-Shamma, Omran ;
Santamaria, J. ;
Fadhel, Mohammed A. ;
Al-Amidie, Muthana ;
Farhan, Laith .
JOURNAL OF BIG DATA, 2021, 8 (01)
[5]  
Blumentals E., 2022, JT PROC ACM IUI WORK
[6]  
Ortega JDS, 2019, Arxiv, DOI arXiv:1906.10623
[7]   Survey on speech emotion recognition: Features, classification schemes, and databases [J].
El Ayadi, Moataz ;
Kamel, Mohamed S. ;
Karray, Fakhri .
PATTERN RECOGNITION, 2011, 44 (03) :572-587
[8]   Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network [J].
Farooq, Misbah ;
Hussain, Fawad ;
Baloch, Naveed Khan ;
Raja, Fawad Riasat ;
Yu, Heejung ;
Zikria, Yousaf Bin .
SENSORS, 2020, 20 (21) :1-18
[9]  
Firoozabadi Ali Dehghan, 2021, Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020. Advances in Intelligent Systems and Computing (AISC 1261), P455, DOI 10.1007/978-3-030-58669-0_41
[10]   A survey of emotion recognition methods with emphasis on E-Learning environments [J].
Imani, Maryam ;
Montazer, Gholam Ali .
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2019, 147