Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition

被引：17

作者：

Falahzadeh, Mohammad Reza ^{[1
]}

Farokhi, Fardad ^{[2
]}

Harimi, Ali ^{[3
]}

Sabbaghi-Nadooshan, Reza ^{[1
]}

机构：

[1] Islamic Azad Univ, Dept Elect Engn, Cent Tehran Branch, Tehran, Iran

[2] Islamic Azad Univ, Dept Biomed Engn, Cent Tehran Branch, Tehran, Iran

[3] Islamic Azad Univ, Dept Elect Engn, Shahrood Branch, Shahrood, Iran

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2023年 / 42卷 / 01期

基金：

英国科研创新办公室;

关键词：

Speech emotion recognition; 3D tensor speech representation; Chaogram; Deep convolutional neural network; Gray wolf optimization algorithm; RECONSTRUCTED PHASE-SPACE; SPECTRAL FEATURES; CLASSIFICATION; CNN;

D O I：

10.1007/s00034-022-02130-3

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech emotion recognition (SER), an important method of emotional human-machine interaction, has been the focus of much research in recent years. Motivated by powerful Deep Convolutional Neural Networks (DCNNs) to learn features and the landmark success of these networks in the field of image classification, the present study aimed to prepare a pre-trained DCNN model for SER and provide compatible input to these networks by converting a speech signal into a 3D tensor. First, using a reconstructed phase space, speech samples are reconstructed in a 3D phase space. Studies have shown that the patterns formed in this space contain meaningful emotional features of the speaker. To provide an input that is compatible with DCNN, a new speech signal representation called Chaogram was introduced as the projection of these patterns, and three channels similar to RGB images were obtained. In the next step, image enhancement techniques were used to highlight the details of Chaogram images. Then, the Visual Geometry Group (VGG) DCNN pre-trained on the large ImageNet dataset is utilized to learn Chaogram high-level features and corresponding emotion classes. Finally, transfer learning is performed on the proposed model, and the presented model is fine-tuned on our datasets. To optimize the hyper-parameter arrangement of architecture-determined CNNs, an innovative DCNN-GWO (gray wolf optimization) is also presented. The results of this study on two public datasets of emotions, i.e., EMO-DB and eNTERFACE05, show the promising performance of the proposed model, which can greatly improve SER applications.

引用

页码：449 / 492

页数：44

共 50 条

[41] Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
Jiang, Wei
Wang, Zheng
Jin, Jesse S.
Han, Xianfeng
Li, Chunguang
SENSORS, 2019, 19 (12)
[42] 3D Convolutional Neural Network for Speech Emotion Recognition With Its Realization on Intel CPU and NVIDIA GPU
Falahzadeh, Mohammad Reza
Farsa, Edris Zaman
Harimi, Ali
Ahmadi, Arash
Abraham, Ajith
IEEE ACCESS, 2022, 10 : 112460 - 112471
[43] Speech Emotion Recognition Based on Deep Belief Network
Shi, Peng
2018 IEEE 15TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2018,
[44] Speech Emotion Recognition based on Multi-Level Residual Convolutional Neural Networks
Zheng, Kai
Xia, ZhiGuang
Zhang, Yi
Xu, Xuan
Fu, Yaqin
ENGINEERING LETTERS, 2020, 28 (02) : 559 - 565
[45] Spontaneous Speech Emotion Recognition Using Multiscale Deep Convolutional LSTM
Zhang, Shiqing
Zhao, Xiaoming
Tian, Qi
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (02) : 680 - 688
[46] Speech Emotion Recognition using Convolutional Recurrent Neural Networks and Spectrograms
Qamhan, Mustafa A.
Meftah, Ali H.
Selouani, Sid-Ahmed
Alotaibi, Yousef A.
Zakariah, Mohammed
Seddiq, Yasser Mohammad
2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2020,
[47] 3D Convolutional Recurrent Global Neural Network for Speech Emotion Recognition
Zayene, Baraa
Jlassi, Chiraz
Arous, Najet
2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
[48] Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network
Sun, Congshan
Li, Haifeng
Ma, Lin
FRONTIERS IN PSYCHOLOGY, 2023, 13
[49] Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
Guo, Lili
Wang, Longbiao
Dang, Jianwu
Zhang, Linjuan
Guan, Haotian
Li, Xiangang
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1611 - 1615
[50] LIGHT-SERNET: A LIGHTWEIGHT FULLY CONVOLUTIONAL NEURAL NETWORK FOR SPEECH EMOTION RECOGNITION
Aftab, Arya
Morsali, Alireza
Ghaemmaghami, Shahrokh
Champagne, Benoit
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6912 - 6916

← 1 2 3 4 5 →