Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition

被引:17
作者
Falahzadeh, Mohammad Reza [1 ]
Farokhi, Fardad [2 ]
Harimi, Ali [3 ]
Sabbaghi-Nadooshan, Reza [1 ]
机构
[1] Islamic Azad Univ, Dept Elect Engn, Cent Tehran Branch, Tehran, Iran
[2] Islamic Azad Univ, Dept Biomed Engn, Cent Tehran Branch, Tehran, Iran
[3] Islamic Azad Univ, Dept Elect Engn, Shahrood Branch, Shahrood, Iran
基金
英国科研创新办公室;
关键词
Speech emotion recognition; 3D tensor speech representation; Chaogram; Deep convolutional neural network; Gray wolf optimization algorithm; RECONSTRUCTED PHASE-SPACE; SPECTRAL FEATURES; CLASSIFICATION; CNN;
D O I
10.1007/s00034-022-02130-3
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech emotion recognition (SER), an important method of emotional human-machine interaction, has been the focus of much research in recent years. Motivated by powerful Deep Convolutional Neural Networks (DCNNs) to learn features and the landmark success of these networks in the field of image classification, the present study aimed to prepare a pre-trained DCNN model for SER and provide compatible input to these networks by converting a speech signal into a 3D tensor. First, using a reconstructed phase space, speech samples are reconstructed in a 3D phase space. Studies have shown that the patterns formed in this space contain meaningful emotional features of the speaker. To provide an input that is compatible with DCNN, a new speech signal representation called Chaogram was introduced as the projection of these patterns, and three channels similar to RGB images were obtained. In the next step, image enhancement techniques were used to highlight the details of Chaogram images. Then, the Visual Geometry Group (VGG) DCNN pre-trained on the large ImageNet dataset is utilized to learn Chaogram high-level features and corresponding emotion classes. Finally, transfer learning is performed on the proposed model, and the presented model is fine-tuned on our datasets. To optimize the hyper-parameter arrangement of architecture-determined CNNs, an innovative DCNN-GWO (gray wolf optimization) is also presented. The results of this study on two public datasets of emotions, i.e., EMO-DB and eNTERFACE05, show the promising performance of the proposed model, which can greatly improve SER applications.
引用
收藏
页码:449 / 492
页数:44
相关论文
共 50 条
  • [41] Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
    Jiang, Wei
    Wang, Zheng
    Jin, Jesse S.
    Han, Xianfeng
    Li, Chunguang
    SENSORS, 2019, 19 (12)
  • [42] 3D Convolutional Neural Network for Speech Emotion Recognition With Its Realization on Intel CPU and NVIDIA GPU
    Falahzadeh, Mohammad Reza
    Farsa, Edris Zaman
    Harimi, Ali
    Ahmadi, Arash
    Abraham, Ajith
    IEEE ACCESS, 2022, 10 : 112460 - 112471
  • [43] Speech Emotion Recognition Based on Deep Belief Network
    Shi, Peng
    2018 IEEE 15TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2018,
  • [44] Speech Emotion Recognition based on Multi-Level Residual Convolutional Neural Networks
    Zheng, Kai
    Xia, ZhiGuang
    Zhang, Yi
    Xu, Xuan
    Fu, Yaqin
    ENGINEERING LETTERS, 2020, 28 (02) : 559 - 565
  • [45] Spontaneous Speech Emotion Recognition Using Multiscale Deep Convolutional LSTM
    Zhang, Shiqing
    Zhao, Xiaoming
    Tian, Qi
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (02) : 680 - 688
  • [46] Speech Emotion Recognition using Convolutional Recurrent Neural Networks and Spectrograms
    Qamhan, Mustafa A.
    Meftah, Ali H.
    Selouani, Sid-Ahmed
    Alotaibi, Yousef A.
    Zakariah, Mohammed
    Seddiq, Yasser Mohammad
    2020 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2020,
  • [47] 3D Convolutional Recurrent Global Neural Network for Speech Emotion Recognition
    Zayene, Baraa
    Jlassi, Chiraz
    Arous, Najet
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
  • [48] Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network
    Sun, Congshan
    Li, Haifeng
    Ma, Lin
    FRONTIERS IN PSYCHOLOGY, 2023, 13
  • [49] Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Zhang, Linjuan
    Guan, Haotian
    Li, Xiangang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1611 - 1615
  • [50] LIGHT-SERNET: A LIGHTWEIGHT FULLY CONVOLUTIONAL NEURAL NETWORK FOR SPEECH EMOTION RECOGNITION
    Aftab, Arya
    Morsali, Alireza
    Ghaemmaghami, Shahrokh
    Champagne, Benoit
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6912 - 6916