Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition

被引：17

作者：

Falahzadeh, Mohammad Reza ^{[1
]}

Farokhi, Fardad ^{[2
]}

Harimi, Ali ^{[3
]}

Sabbaghi-Nadooshan, Reza ^{[1
]}

机构：

[1] Islamic Azad Univ, Dept Elect Engn, Cent Tehran Branch, Tehran, Iran

[2] Islamic Azad Univ, Dept Biomed Engn, Cent Tehran Branch, Tehran, Iran

[3] Islamic Azad Univ, Dept Elect Engn, Shahrood Branch, Shahrood, Iran

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2023年 / 42卷 / 01期

基金：

英国科研创新办公室;

关键词：

Speech emotion recognition; 3D tensor speech representation; Chaogram; Deep convolutional neural network; Gray wolf optimization algorithm; RECONSTRUCTED PHASE-SPACE; SPECTRAL FEATURES; CLASSIFICATION; CNN;

D O I：

10.1007/s00034-022-02130-3

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech emotion recognition (SER), an important method of emotional human-machine interaction, has been the focus of much research in recent years. Motivated by powerful Deep Convolutional Neural Networks (DCNNs) to learn features and the landmark success of these networks in the field of image classification, the present study aimed to prepare a pre-trained DCNN model for SER and provide compatible input to these networks by converting a speech signal into a 3D tensor. First, using a reconstructed phase space, speech samples are reconstructed in a 3D phase space. Studies have shown that the patterns formed in this space contain meaningful emotional features of the speaker. To provide an input that is compatible with DCNN, a new speech signal representation called Chaogram was introduced as the projection of these patterns, and three channels similar to RGB images were obtained. In the next step, image enhancement techniques were used to highlight the details of Chaogram images. Then, the Visual Geometry Group (VGG) DCNN pre-trained on the large ImageNet dataset is utilized to learn Chaogram high-level features and corresponding emotion classes. Finally, transfer learning is performed on the proposed model, and the presented model is fine-tuned on our datasets. To optimize the hyper-parameter arrangement of architecture-determined CNNs, an innovative DCNN-GWO (gray wolf optimization) is also presented. The results of this study on two public datasets of emotions, i.e., EMO-DB and eNTERFACE05, show the promising performance of the proposed model, which can greatly improve SER applications.

引用

页码：449 / 492

页数：44

共 50 条

[21] Speech emotion recognition based on spiking neural network and convolutional neural network
Du, Chengyan
Liu, Fu
Kang, Bing
Hou, Tao
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 147
[22] An Experimental Study of Speech Emotion Recognition Based on Deep Convolutional Neural Networks
Zheng, W. Q.
Yu, J. S.
Zou, Y. X.
2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 827 - 831
[23] Deep Convolutional Neural Network for Arabic Speech Recognition
Amari, Rafik
Noubigh, Zouhaira
Zrigui, Salah
Berchech, Dhaou
Nicolas, Henri
Zrigui, Mounir
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 : 120 - 134
[24] Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions
Nam, Youngja
Lee, Chankyu
SENSORS, 2021, 21 (13)
[25] Deep Convolutional Neural Networks for Feature Extraction in Speech Emotion Recognition
Heracleous, Panikos
Mohammad, Yasser
Yoneyama, Akio
HUMAN-COMPUTER INTERACTION. RECOGNITION AND INTERACTION TECHNOLOGIES, HCI 2019, PT II, 2019, 11567 : 117 - 132
[26] Android Malware Classification with Gray Wolf Optimization Algorithm and Deep Neural Network Hybrid Approach
Gullu, Merve
Barisci, Necattin
2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
[27] Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network
Zisad, Sharif Noor
Hossain, Mohammad Shahadat
Andersson, Karl
BRAIN INFORMATICS, BI 2020, 2020, 12241 : 287 - 296
[28] EmNet: a deep integrated convolutional neural network for facial emotion recognition in the wild
Saurav, Sumeet
Saini, Ravi
Singh, Sanjay
APPLIED INTELLIGENCE, 2021, 51 (08) : 5543 - 5570
[29] Constructing Speech Emotion Recognition Model Based on Convolutional Neural Network
Kuo, Jong-Yih
Chen, Zhao-Ming
Lin, Hui-Chi
2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW 2021), 2021, : 52 - 56
[30] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
Zhang, Linjuan
Wang, Longbiao
Dang, Jianwu
Guo, Lili
Guan, Haotian
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71

← 1 2 3 4 5 →