Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition

被引：31

作者：

Agarwal, Gaurav ^{[1
]}

Om, Hari ^{[1
]}

机构：

[1] IIT ISM, Dept Comp Sci & Engn, Dhanbad 826004, Jharkhand, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2021年 / 80卷 / 07期

关键词：

Speech emotion recognition; Adaptive wavelet transform; Modified galactic swarm optimization; Adaptive sunflower optimization algorithm; Optimized deep neural network; Deer hunting optimization algorithm; IDENTIFICATION; SYSTEM; VOICE;

D O I：

10.1007/s11042-020-10118-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes a speech emotion recognition technique based on Optimized Deep Neural Network. The speech signals are denoised by presenting a novel adaptive wavelet transform with a modified galactic swarm optimization algorithm (AWT_MGSO). From the noise removed speech signals, the spectral features like LPC (Linear Prediction Coefficients), MFCC (Mel frequency cepstral coefficients), PSD (power spectral density) and prosodic features like energy, entropy, formant frequencies and pitch are extracted and certain features are selected by ASFO (Adaptive Sunflower Optimization Algorithm). The optimized DNN-DHO (Deep Neural Network with Deer Hunting Optimization Algorithm) is proposed for emotion classification. An enhanced squirrel search algorithm is proposed to update the weight in the optimized DNN_DHO classifier. In this study, all the eight emotions of the speech from RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) and TESS (Toronto Emotional Speech Set) databases for English and IITKGP-SEHSC (Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus) database for Hindi are classified. The experimental results are obtained and compared with the classifiers such as DNN_DHO, DNN (Deep Neural Network) and DAE (Deep Auto Encoder). The experimental results show that the proposed algorithm obtains maximum accuracy as 97.85% by the TESS dataset, 97.14% by the RAVDESS dataset and 93.75% by the IITKGP-SEHSC dataset by the DNN-HHO classifier.

引用

页码：9961 / 9992

页数：32

共 50 条

[41] Speech Emotion Recognition Using Deep Learning Transfer Models and Explainable Techniques
Kim, Tae-Wan
Kwak, Keun-Chang
APPLIED SCIENCES-BASEL, 2024, 14 (04):
[42] Speech emotion recognition using feature fusion: a hybrid approach to deep learning
Khan, Waleed Akram
ul Qudous, Hamad
Farhan, Asma Ahmad
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (31) : 75557 - 75584
[43] Feature Learning via Deep Belief Network for Chinese Speech Emotion Recognition
Zhang, Shiqing
Zhao, Xiaoming
Chuang, Yuelong
Guo, Wenping
Chen, Ying
PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 645 - 651
[44] Analysis of Deep Learning Architectures for Cross-corpus Speech Emotion Recognition
Parry, Jack
Palaz, Dimitri
Clarke, Georgia
Lecomte, Pauline
Mead, Rebecca
Berger, Michael
Hofer, Gregor
INTERSPEECH 2019, 2019, : 1656 - 1660
[45] Unsupervised Feature Learning for Speech Emotion Recognition Based on Autoencoder
Ying, Yangwei
Tu, Yuanwu
Zhou, Hong
ELECTRONICS, 2021, 10 (17)
[46] Deep scattering network for speech emotion recognition
Singh, Premjeet
Saha, Goutam
Sahidullah, Md
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 131 - 135
[47] Speech emotion recognition based on emotion perception
Liu, Gang
Cai, Shifang
Wang, Ce
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
[48] Performance Improvement in Speech Based Emotion Recognition With DWT and ANOVA
Shinde, Ashwini S.
Patil, Vaishali V.
COMMUNICATIONS IN MATHEMATICS AND APPLICATIONS, 2023, 14 (03): : 1189 - 1198
[49] Deep learning and SVM-based emotion recognition from Chinese speech for smart affective services
Zhang, Weishan
Zhao, Dehai
Chai, Zhi
Yang, Laurence T.
Liu, Xin
Gong, Faming
Yang, Su
SOFTWARE-PRACTICE & EXPERIENCE, 2017, 47 (08) : 1127 - 1138
[50] SPEECH EMOTION RECOGNITION WITH LOCAL-GLOBAL AWARE DEEP REPRESENTATION LEARNING
Liu, Jiaxing
Liu, Zhilei
Wang, Longbiao
Guo, Lili
Dang, Jianwu
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7174 - 7178

← 1 2 3 4 5 →