Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network

被引:26
|
作者
Guo, Lili [1 ]
Wang, Longbiao [1 ]
Dang, Jianwu [1 ,2 ]
Zhang, Linjuan [1 ]
Guan, Haotian [3 ]
Li, Xiangang [4 ]
机构
[1] Tianjin Univ, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
[3] Intelligent Spoken Language Technol Tianjin Co, Tianjin, Peoples R China
[4] Didi Chuxing, AI Labs, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
speech emotion recognition; amplitude; phase information; convolutional neural network;
D O I
10.21437/Interspeech.2018-2156
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous studies of speech emotion recognition utilize convolutional neural network (CNN) directly on amplitude spectrogram to extract features. CNN combines with bidirectional long short term memory (BLSTM) has become the state-of-the-art model. However, phase information has been ignored in this model. The importance of phase information in speech processing field is gathering attention. In this paper, we propose feature extraction of amplitude spectrogram and phase information using CNN for speech emotion recognition. The modified group delay cepstral coefficient (MGDCC) and relative phase are used as phase information. Firstly, we analyze the influence of phase information on speech emotion recognition. Then we design a CNN-based feature representation using amplitude and phase information. Finally, experiments were conducted on EmoDB to validate the effectiveness of phase information. Integrating amplitude spectrogram with phase information, the relative emotion error recognition rates are reduced by over 33% in comparison with using only amplitude-based feature.
引用
收藏
页码:1611 / 1615
页数:5
相关论文
共 50 条
  • [1] Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network
    Zisad, Sharif Noor
    Hossain, Mohammad Shahadat
    Andersson, Karl
    BRAIN INFORMATICS, BI 2020, 2020, 12241 : 287 - 296
  • [2] Design of a Convolutional Neural Network for Speech Emotion Recognition
    Lee, Kyong Hee
    Kim, Do Hyun
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1332 - 1335
  • [3] CONVOLUTIONAL NEURAL NETWORK TECHNIQUES FOR SPEECH EMOTION RECOGNITION
    Parthasarathy, Srinivas
    Tashev, Ivan
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 121 - 125
  • [4] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) : 2341 - 2384
  • [5] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Kishor Bhangale
    Mohanaprasad Kothandaraman
    Circuits, Systems, and Signal Processing, 2024, 43 : 2341 - 2384
  • [6] Speech emotion recognition based on spiking neural network and convolutional neural network
    Du, Chengyan
    Liu, Fu
    Kang, Bing
    Hou, Tao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 147
  • [7] Multimodal speech emotion recognition and classification using convolutional neural network techniques
    Christy, A.
    Vaithyasubramanian, S.
    Jesudoss, A.
    Praveena, M. D. Anto
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (02) : 381 - 388
  • [8] Multimodal speech emotion recognition and classification using convolutional neural network techniques
    A. Christy
    S. Vaithyasubramanian
    A. Jesudoss
    M. D. Anto Praveena
    International Journal of Speech Technology, 2020, 23 : 381 - 388
  • [9] Emotion Recognition Using a Convolutional Neural Network
    Zatarain-Cabada, Ramon
    Lucia Barron-Estrada, Maria
    Gonzalez-Hernandez, Francisco
    Rodriguez-Rangel, Hector
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2017, PT II, 2018, 10633 : 208 - 219
  • [10] Speech Emotion Recognition based on Interactive Convolutional Neural Network
    Cheng, Huihui
    Tang, Xiaoyu
    2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 163 - 167