Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition

被引:17
|
作者
Falahzadeh, Mohammad Reza [1 ]
Farokhi, Fardad [2 ]
Harimi, Ali [3 ]
Sabbaghi-Nadooshan, Reza [1 ]
机构
[1] Islamic Azad Univ, Dept Elect Engn, Cent Tehran Branch, Tehran, Iran
[2] Islamic Azad Univ, Dept Biomed Engn, Cent Tehran Branch, Tehran, Iran
[3] Islamic Azad Univ, Dept Elect Engn, Shahrood Branch, Shahrood, Iran
关键词
Speech emotion recognition; 3D tensor speech representation; Chaogram; Deep convolutional neural network; Gray wolf optimization algorithm; RECONSTRUCTED PHASE-SPACE; SPECTRAL FEATURES; CLASSIFICATION; CNN;
D O I
10.1007/s00034-022-02130-3
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech emotion recognition (SER), an important method of emotional human-machine interaction, has been the focus of much research in recent years. Motivated by powerful Deep Convolutional Neural Networks (DCNNs) to learn features and the landmark success of these networks in the field of image classification, the present study aimed to prepare a pre-trained DCNN model for SER and provide compatible input to these networks by converting a speech signal into a 3D tensor. First, using a reconstructed phase space, speech samples are reconstructed in a 3D phase space. Studies have shown that the patterns formed in this space contain meaningful emotional features of the speaker. To provide an input that is compatible with DCNN, a new speech signal representation called Chaogram was introduced as the projection of these patterns, and three channels similar to RGB images were obtained. In the next step, image enhancement techniques were used to highlight the details of Chaogram images. Then, the Visual Geometry Group (VGG) DCNN pre-trained on the large ImageNet dataset is utilized to learn Chaogram high-level features and corresponding emotion classes. Finally, transfer learning is performed on the proposed model, and the presented model is fine-tuned on our datasets. To optimize the hyper-parameter arrangement of architecture-determined CNNs, an innovative DCNN-GWO (gray wolf optimization) is also presented. The results of this study on two public datasets of emotions, i.e., EMO-DB and eNTERFACE05, show the promising performance of the proposed model, which can greatly improve SER applications.
引用
收藏
页码:449 / 492
页数:44
相关论文
共 50 条
  • [31] Optimizing Speech Emotion Recognition with Hilbert Curve and convolutional neural network
    Yang, Zijun
    Zhou, Shi
    Zhang, Lifeng
    Serikawa, Seiichi
    Cognitive Robotics, 2024, 4 : 30 - 41
  • [32] Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network
    Zisad, Sharif Noor
    Hossain, Mohammad Shahadat
    Andersson, Karl
    BRAIN INFORMATICS, BI 2020, 2020, 12241 : 287 - 296
  • [33] Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network
    Alluhaidan, Ala Saleh
    Saidani, Oumaima
    Jahangir, Rashid
    Nauman, Muhammad Asif
    Neffati, Omnia Saidani
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [34] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
    Zhang, Linjuan
    Wang, Longbiao
    Dang, Jianwu
    Guo, Lili
    Guan, Haotian
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71
  • [35] Constructing Speech Emotion Recognition Model Based on Convolutional Neural Network
    Kuo, Jong-Yih
    Chen, Zhao-Ming
    Lin, Hui-Chi
    2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW 2021), 2021, : 52 - 56
  • [36] A Study on Speech Emotion Recognition Using a Deep Neural Network
    Lee, Kyong Hee
    Choi, Hyun Kyun
    Jang, Byung Tae
    Kim, Do Hyun
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
  • [37] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
    Huang, Ying
    Hu, Mingqing
    Yu, Xianguo
    Wang, Tao
    Yang, Chen
    PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729
  • [38] An optimized facial emotion recognition architecture based on a deep convolutional neural network and genetic algorithm
    Aghabeigi, Fereshteh
    Nazari, Sara
    Eraghi, Nafiseh Osati
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (02) : 1119 - 1129
  • [39] Audiovisual speech recognition based on a deep convolutional neural network
    Rudregowda S.
    Patilkulkarni S.
    Ravi V.
    H.L. G.
    Krichen M.
    Data Science and Management, 2024, 7 (01): : 25 - 34
  • [40] An optimized facial emotion recognition architecture based on a deep convolutional neural network and genetic algorithm
    Fereshteh Aghabeigi
    Sara Nazari
    Nafiseh Osati Eraghi
    Signal, Image and Video Processing, 2024, 18 : 1119 - 1129