Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information

被引:1
|
作者
Guo, Lili [1 ]
Wang, Longbiao [1 ]
Dang, Jianwu [1 ,2 ]
Liu, Zhilei [1 ]
Guan, Haotian [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
[3] Huiyan Technol Tianjin Co Ltd, Tianjin, Peoples R China
来源
MULTIMEDIA MODELING (MMM 2020), PT I | 2020年 / 11961卷
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; Amplitude spectrogram; Phase information; Modified group delay; Speaker information; CLASSIFICATION; FEATURES;
D O I
10.1007/978-3-030-37731-1_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of a convolutional neural network (CNN) for extracting deep acoustic features from spectrograms has become one of the most commonly used methods for speech emotion recognition. In those studies, however, common amplitude information is chosen as input with no special attention to phase-related or speaker-related information. In this paper, we propose a multi-channel method employing amplitude and phase channels for speech emotion recognition. Two separate CNN channels are adopted to extract deep features from amplitude spectrograms and modified group delay (MGD) spectrograms. Then a concatenate layer is used to fuse the features. Furthermore, to gain more robust features, speaker information is considered in the stage of emotional feature extraction. Finally, the fusion features that considering speaker-related information are fed into the extreme learning machine (ELM) to distinguish emotions. Experiments are conducted on the Emo-DB database to evaluate the proposed model. Results demonstrate the recognition performance of average F1 in 94.82%, which significantly outperforms the baseline CNN-ELM model based on amplitude only spectrograms by 39.27% relative error reduction.
引用
收藏
页码:14 / 25
页数:12
相关论文
共 50 条
  • [1] SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation
    Lim, Seunguook
    Kim, Jihie
    ALGORITHMS, 2023, 16 (01)
  • [2] Speaker-Aware Interactive Graph Attention Network for Emotion Recognition in Conversation
    Jia, Zhaohong
    Shi, Yunwei
    Liu, Weifeng
    Huang, Zhenhua
    Sun, Xiao
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (12)
  • [3] SPEAKER-AWARE SPEECH-TRANSFORMER
    Fan, Zhiyun
    Li, Jie
    Zhou, Shiyu
    Xu, Bo
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 222 - 229
  • [4] Speaker-Aware Monaural Speech Separation
    Xu, Jiahao
    Hu, Kun
    Xu, Chang
    Duc Chung Tran
    Wang, Zhiyong
    INTERSPEECH 2020, 2020, : 1451 - 1455
  • [5] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
  • [6] Speaker-aware Cross-modal Fusion Architecture for Conversational Emotion Recognition
    Zhao, Huan
    Li, Bo
    Zhang, Zixing
    INTERSPEECH 2023, 2023, : 2718 - 2722
  • [7] Speaker-Aware Speech Enhancement with Self-Attention
    Lin, Ju
    Van Wijngaarden, Adriaan J.
    Smith, Melissa C.
    Wang, Kuang-Ching
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 486 - 490
  • [8] Speaker-aware cognitive network with cross-modal attention for multimodal emotion recognition in conversation
    Guo, Lili
    Song, Yikang
    Ding, Shifei
    KNOWLEDGE-BASED SYSTEMS, 2024, 296
  • [9] Speaker-aware neural network based beamformer for speaker extraction in speech mixtures
    Zmplikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Higuchi, Takuya
    Ogawa, Atsunori
    Nakatani, Tomohiro
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2655 - 2659
  • [10] Speaker-aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement
    Chuang, Fu-Kai
    Wang, Syu-Siang
    Hung, Jeih-weih
    Tsao, Yu
    Fang, Shih-Hau
    INTERSPEECH 2019, 2019, : 3173 - 3177