Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition

被引：17

作者：

Falahzadeh, Mohammad Reza ^{[1
]}

Farokhi, Fardad ^{[2
]}

Harimi, Ali ^{[3
]}

Sabbaghi-Nadooshan, Reza ^{[1
]}

机构：

[1] Islamic Azad Univ, Dept Elect Engn, Cent Tehran Branch, Tehran, Iran

[2] Islamic Azad Univ, Dept Biomed Engn, Cent Tehran Branch, Tehran, Iran

[3] Islamic Azad Univ, Dept Elect Engn, Shahrood Branch, Shahrood, Iran

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2023年 / 42卷 / 01期

基金：

英国科研创新办公室;

关键词：

Speech emotion recognition; 3D tensor speech representation; Chaogram; Deep convolutional neural network; Gray wolf optimization algorithm; RECONSTRUCTED PHASE-SPACE; SPECTRAL FEATURES; CLASSIFICATION; CNN;

D O I：

10.1007/s00034-022-02130-3

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech emotion recognition (SER), an important method of emotional human-machine interaction, has been the focus of much research in recent years. Motivated by powerful Deep Convolutional Neural Networks (DCNNs) to learn features and the landmark success of these networks in the field of image classification, the present study aimed to prepare a pre-trained DCNN model for SER and provide compatible input to these networks by converting a speech signal into a 3D tensor. First, using a reconstructed phase space, speech samples are reconstructed in a 3D phase space. Studies have shown that the patterns formed in this space contain meaningful emotional features of the speaker. To provide an input that is compatible with DCNN, a new speech signal representation called Chaogram was introduced as the projection of these patterns, and three channels similar to RGB images were obtained. In the next step, image enhancement techniques were used to highlight the details of Chaogram images. Then, the Visual Geometry Group (VGG) DCNN pre-trained on the large ImageNet dataset is utilized to learn Chaogram high-level features and corresponding emotion classes. Finally, transfer learning is performed on the proposed model, and the presented model is fine-tuned on our datasets. To optimize the hyper-parameter arrangement of architecture-determined CNNs, an innovative DCNN-GWO (gray wolf optimization) is also presented. The results of this study on two public datasets of emotions, i.e., EMO-DB and eNTERFACE05, show the promising performance of the proposed model, which can greatly improve SER applications.

引用

页码：449 / 492

页数：44

共 50 条

[31] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
Huang, Ying
Hu, Mingqing
Yu, Xianguo
Wang, Tao
Yang, Chen
PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729
[32] A NEW APPROACH FOR SPEECH EMOTION RECOGNITION USING SINGLE LAYERED CONVOLUTIONAL NEURAL NETWORK
Mannan, J. Mannar
Kumar, V. Vinoth
Palaiahnakote, Shivakumara
Khan, Surbhi Bhatia
Almusharraf, Ahlam
MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2024, 37 (01) : 89 - 106
[33] Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network
Mustaqeem
Kwon, Soonil
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (09) : 5116 - 5135
[34] A 3D Tensor Representation of Speech and 3D Convolutional Neural Network for Emotion Recognition
Falahzadeh, Mohammad Reza
Farokhi, Fardad
Harimi, Ali
Sabbaghi-Nadooshan, Reza
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (07) : 4271 - 4291
[35] Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition
Jiang, Pengxu
Fu, Hongliang
Tao, Huawei
Lei, Peizhi
Zhao, Li
IEEE ACCESS, 2019, 7 : 90368 - 90377
[36] Speech Emotion Recognition of Merged Features Based on Improved Convolutional Neural Network
Peng, Wangyue
Tang, Xiaoyu
2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 301 - 305
[37] Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion Recognition
Zhang, Shiqing
Chen, Aihua
Guo, Wenping
Cui, Yueli
Zhao, Xiaoming
Liu, Limei
IEEE ACCESS, 2020, 8 : 23496 - 23505
[38] Cross-Corpus Speech Emotion Recognition Based on Deep Domain-Adaptive Convolutional Neural Network
Liu, Jiateng
Zheng, Wenming
Zong, Yuan
Lu, Cheng
Tang, Chuangao
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (02) : 459 - 463
[39] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
Wani, Taiba Majid
Gunawan, Teddy Surya
Qadri, Syed Asif Ahmad
Mansor, Hasmah
Kartiwi, Mira
Ismail, Nanang
PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,
[40] Active Learning for Speech Emotion Recognition Using Deep Neural Network
Abdelwahab, Mohammed
Busso, Carlos
2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,

← 1 2 3 4 5 →