Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network

被引:27
|
作者
Guo, Lili [1 ]
Wang, Longbiao [1 ]
Dang, Jianwu [1 ,2 ]
Zhang, Linjuan [1 ]
Guan, Haotian [3 ]
Li, Xiangang [4 ]
机构
[1] Tianjin Univ, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
[3] Intelligent Spoken Language Technol Tianjin Co, Tianjin, Peoples R China
[4] Didi Chuxing, AI Labs, Beijing, Peoples R China
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
基金
中国国家自然科学基金;
关键词
speech emotion recognition; amplitude; phase information; convolutional neural network;
D O I
10.21437/Interspeech.2018-2156
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous studies of speech emotion recognition utilize convolutional neural network (CNN) directly on amplitude spectrogram to extract features. CNN combines with bidirectional long short term memory (BLSTM) has become the state-of-the-art model. However, phase information has been ignored in this model. The importance of phase information in speech processing field is gathering attention. In this paper, we propose feature extraction of amplitude spectrogram and phase information using CNN for speech emotion recognition. The modified group delay cepstral coefficient (MGDCC) and relative phase are used as phase information. Firstly, we analyze the influence of phase information on speech emotion recognition. Then we design a CNN-based feature representation using amplitude and phase information. Finally, experiments were conducted on EmoDB to validate the effectiveness of phase information. Integrating amplitude spectrogram with phase information, the relative emotion error recognition rates are reduced by over 33% in comparison with using only amplitude-based feature.
引用
收藏
页码:1611 / 1615
页数:5
相关论文
共 50 条
  • [21] APIN: Amplitude- and phase-aware interaction network for speech emotion recognition
    Guo, Lili
    Li, Jie
    Ding, Shifei
    Dang, Jianwu
    SPEECH COMMUNICATION, 2025, 169
  • [22] Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching
    Zhang, Shiqing
    Zhang, Shiliang
    Huang, Tiejun
    Gao, Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (06) : 1576 - 1590
  • [23] Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network
    Farooq, Misbah
    Hussain, Fawad
    Baloch, Naveed Khan
    Raja, Fawad Riasat
    Yu, Heejung
    Zikria, Yousaf Bin
    SENSORS, 2020, 20 (21) : 1 - 18
  • [24] Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions
    Nam, Youngja
    Lee, Chankyu
    SENSORS, 2021, 21 (13)
  • [25] A Method of Speech Coding for Speech Recognition Using a Convolutional Neural Network
    Kubanek, Mariusz
    Bobulski, Janusz
    Kulawik, Joanna
    SYMMETRY-BASEL, 2019, 11 (09): : 1 - 12
  • [26] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [27] Effect on speech emotion classification of a feature selection approach using a convolutional neural network
    Amjad, Ammar
    Khan, Lal
    Chang, Hsien-Tsung
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [28] Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks
    Ristea, Nicolae-Catalin
    Dutu, Liviu Cristian
    Radoi, Anamaria
    2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
  • [29] Speech Emotion Recognition Using Neural Network and Wavelet Features
    Roy, Tanmoy
    Marwala, Tshilidzi
    Chakraverty, S.
    RECENT TRENDS IN WAVE MECHANICS AND VIBRATIONS, WMVC 2018, 2020, : 427 - 438
  • [30] Dysarthric Speech Recognition Using Convolutional LSTM Neural Network
    Kim, Myungjong
    Cao, Beiming
    An, Kwanghoon
    Wang, Jun
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2948 - 2952