Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition

被引:17
作者
Falahzadeh, Mohammad Reza [1 ]
Farokhi, Fardad [2 ]
Harimi, Ali [3 ]
Sabbaghi-Nadooshan, Reza [1 ]
机构
[1] Islamic Azad Univ, Dept Elect Engn, Cent Tehran Branch, Tehran, Iran
[2] Islamic Azad Univ, Dept Biomed Engn, Cent Tehran Branch, Tehran, Iran
[3] Islamic Azad Univ, Dept Elect Engn, Shahrood Branch, Shahrood, Iran
基金
英国科研创新办公室;
关键词
Speech emotion recognition; 3D tensor speech representation; Chaogram; Deep convolutional neural network; Gray wolf optimization algorithm; RECONSTRUCTED PHASE-SPACE; SPECTRAL FEATURES; CLASSIFICATION; CNN;
D O I
10.1007/s00034-022-02130-3
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech emotion recognition (SER), an important method of emotional human-machine interaction, has been the focus of much research in recent years. Motivated by powerful Deep Convolutional Neural Networks (DCNNs) to learn features and the landmark success of these networks in the field of image classification, the present study aimed to prepare a pre-trained DCNN model for SER and provide compatible input to these networks by converting a speech signal into a 3D tensor. First, using a reconstructed phase space, speech samples are reconstructed in a 3D phase space. Studies have shown that the patterns formed in this space contain meaningful emotional features of the speaker. To provide an input that is compatible with DCNN, a new speech signal representation called Chaogram was introduced as the projection of these patterns, and three channels similar to RGB images were obtained. In the next step, image enhancement techniques were used to highlight the details of Chaogram images. Then, the Visual Geometry Group (VGG) DCNN pre-trained on the large ImageNet dataset is utilized to learn Chaogram high-level features and corresponding emotion classes. Finally, transfer learning is performed on the proposed model, and the presented model is fine-tuned on our datasets. To optimize the hyper-parameter arrangement of architecture-determined CNNs, an innovative DCNN-GWO (gray wolf optimization) is also presented. The results of this study on two public datasets of emotions, i.e., EMO-DB and eNTERFACE05, show the promising performance of the proposed model, which can greatly improve SER applications.
引用
收藏
页码:449 / 492
页数:44
相关论文
共 50 条
  • [31] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
    Huang, Ying
    Hu, Mingqing
    Yu, Xianguo
    Wang, Tao
    Yang, Chen
    PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729
  • [32] A NEW APPROACH FOR SPEECH EMOTION RECOGNITION USING SINGLE LAYERED CONVOLUTIONAL NEURAL NETWORK
    Mannan, J. Mannar
    Kumar, V. Vinoth
    Palaiahnakote, Shivakumara
    Khan, Surbhi Bhatia
    Almusharraf, Ahlam
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2024, 37 (01) : 89 - 106
  • [33] Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network
    Mustaqeem
    Kwon, Soonil
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (09) : 5116 - 5135
  • [34] A 3D Tensor Representation of Speech and 3D Convolutional Neural Network for Emotion Recognition
    Falahzadeh, Mohammad Reza
    Farokhi, Fardad
    Harimi, Ali
    Sabbaghi-Nadooshan, Reza
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (07) : 4271 - 4291
  • [35] Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition
    Jiang, Pengxu
    Fu, Hongliang
    Tao, Huawei
    Lei, Peizhi
    Zhao, Li
    IEEE ACCESS, 2019, 7 : 90368 - 90377
  • [36] Speech Emotion Recognition of Merged Features Based on Improved Convolutional Neural Network
    Peng, Wangyue
    Tang, Xiaoyu
    2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 301 - 305
  • [37] Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion Recognition
    Zhang, Shiqing
    Chen, Aihua
    Guo, Wenping
    Cui, Yueli
    Zhao, Xiaoming
    Liu, Limei
    IEEE ACCESS, 2020, 8 : 23496 - 23505
  • [38] Cross-Corpus Speech Emotion Recognition Based on Deep Domain-Adaptive Convolutional Neural Network
    Liu, Jiateng
    Zheng, Wenming
    Zong, Yuan
    Lu, Cheng
    Tang, Chuangao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (02) : 459 - 463
  • [39] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
    Wani, Taiba Majid
    Gunawan, Teddy Surya
    Qadri, Syed Asif Ahmad
    Mansor, Hasmah
    Kartiwi, Mira
    Ismail, Nanang
    PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,
  • [40] Active Learning for Speech Emotion Recognition Using Deep Neural Network
    Abdelwahab, Mohammed
    Busso, Carlos
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,