Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network

被引:26
|
作者
Guo, Lili [1 ]
Wang, Longbiao [1 ]
Dang, Jianwu [1 ,2 ]
Zhang, Linjuan [1 ]
Guan, Haotian [3 ]
Li, Xiangang [4 ]
机构
[1] Tianjin Univ, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
[3] Intelligent Spoken Language Technol Tianjin Co, Tianjin, Peoples R China
[4] Didi Chuxing, AI Labs, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
speech emotion recognition; amplitude; phase information; convolutional neural network;
D O I
10.21437/Interspeech.2018-2156
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous studies of speech emotion recognition utilize convolutional neural network (CNN) directly on amplitude spectrogram to extract features. CNN combines with bidirectional long short term memory (BLSTM) has become the state-of-the-art model. However, phase information has been ignored in this model. The importance of phase information in speech processing field is gathering attention. In this paper, we propose feature extraction of amplitude spectrogram and phase information using CNN for speech emotion recognition. The modified group delay cepstral coefficient (MGDCC) and relative phase are used as phase information. Firstly, we analyze the influence of phase information on speech emotion recognition. Then we design a CNN-based feature representation using amplitude and phase information. Finally, experiments were conducted on EmoDB to validate the effectiveness of phase information. Integrating amplitude spectrogram with phase information, the relative emotion error recognition rates are reduced by over 33% in comparison with using only amplitude-based feature.
引用
收藏
页码:1611 / 1615
页数:5
相关论文
共 50 条
  • [21] Constructing Speech Emotion Recognition Model Based on Convolutional Neural Network
    Kuo, Jong-Yih
    Chen, Zhao-Ming
    Lin, Hui-Chi
    2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW 2021), 2021, : 52 - 56
  • [22] Speech based emotion recognition by using a faster region-based convolutional neural network
    Suneetha C.
    Anitha R.
    Multimedia Tools and Applications, 2025, 84 (8) : 5205 - 5237
  • [23] Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching
    Zhang, Shiqing
    Zhang, Shiliang
    Huang, Tiejun
    Gao, Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (06) : 1576 - 1590
  • [24] Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network
    Farooq, Misbah
    Hussain, Fawad
    Baloch, Naveed Khan
    Raja, Fawad Riasat
    Yu, Heejung
    Zikria, Yousaf Bin
    SENSORS, 2020, 20 (21) : 1 - 18
  • [25] Speech Emotion Recognition Based on Multi-Task Learning Using a Convolutional Neural Network
    Kim, Nam Kyun
    Lee, Jiwon
    Ha, Hun Kyu
    Lee, Geon Woo
    Lee, Jung Hyuk
    Kim, Hong Kook
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 704 - 707
  • [26] Speech Emotion Recognition using Convolutional Neural Network with Audio Word-based Embedding
    Huang, Kun-Yi
    Wu, Chung-Hsien
    Hong, Qian-Bei
    Su, Ming-Hsiang
    Zeng, Yuan-Rong
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 265 - 269
  • [27] Emotion recognition based on group phase locking value using convolutional neural network
    Gaochao Cui
    Xueyuan Li
    Hideaki Touyama
    Scientific Reports, 13
  • [28] Emotion recognition based on group phase locking value using convolutional neural network
    Cui, Gaochao
    Li, Xueyuan
    Touyama, Hideaki
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [29] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
    Mountzouris, Konstantinos
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    Corchado, Juan M.
    Iglesias, Carlos A.
    Kim, Byung-Gyu
    Mehmood, Rashid
    Ren, Fuji
    Lee, In
    ELECTRONICS, 2023, 12 (20)
  • [30] Enhancing Speech Emotion Recognition Using Deep Convolutional Neural Networks
    Islam, M. M. Manjurul
    Kabir, Md Alamgir
    Sheikh, Alamin
    Saiduzzaman, Muhammad
    Hafid, Abdelakram
    Abdullah, Saad
    PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2024, 2024, : 95 - 100