Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information

被引:1
|
作者
Guo, Lili [1 ]
Wang, Longbiao [1 ]
Dang, Jianwu [1 ,2 ]
Liu, Zhilei [1 ]
Guan, Haotian [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
[3] Huiyan Technol Tianjin Co Ltd, Tianjin, Peoples R China
来源
MULTIMEDIA MODELING (MMM 2020), PT I | 2020年 / 11961卷
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; Amplitude spectrogram; Phase information; Modified group delay; Speaker information; CLASSIFICATION; FEATURES;
D O I
10.1007/978-3-030-37731-1_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of a convolutional neural network (CNN) for extracting deep acoustic features from spectrograms has become one of the most commonly used methods for speech emotion recognition. In those studies, however, common amplitude information is chosen as input with no special attention to phase-related or speaker-related information. In this paper, we propose a multi-channel method employing amplitude and phase channels for speech emotion recognition. Two separate CNN channels are adopted to extract deep features from amplitude spectrograms and modified group delay (MGD) spectrograms. Then a concatenate layer is used to fuse the features. Furthermore, to gain more robust features, speaker information is considered in the stage of emotional feature extraction. Finally, the fusion features that considering speaker-related information are fed into the extreme learning machine (ELM) to distinguish emotions. Experiments are conducted on the Emo-DB database to evaluate the proposed model. Results demonstrate the recognition performance of average F1 in 94.82%, which significantly outperforms the baseline CNN-ELM model based on amplitude only spectrograms by 39.27% relative error reduction.
引用
收藏
页码:14 / 25
页数:12
相关论文
共 50 条
  • [21] Speaker Recognition and Speech Emotion Recognition Based on GMM
    Xu, Shupeng
    Liu, Yan
    Liu, Xiping
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ELECTRIC AND ELECTRONICS, 2013, : 434 - 436
  • [22] AN ONLINE SPEAKER-AWARE SPEECH SEPARATION APPROACH BASED ON TIME-DOMAIN REPRESENTATION
    Wang, Hui
    Song, Yan
    Li, Zeng-Xi
    McLoughlin, Ian
    Dai, Li-Rong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6379 - 6383
  • [23] Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
    Wu, Wen
    Zhang, Chao
    Woodland, Philip C.
    INTERSPEECH 2023, 2023, : 3607 - 3611
  • [24] Improving Speech Emotion Recognition via Fine-tuning ASR with Speaker Information
    Ta, Bao Thang
    Nguyen, Tung Lam
    Dang, Dinh Son
    Le, Nhat Minh
    Do, Van Hai
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1596 - 1601
  • [25] Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters
    Xi, Yuxuan
    Li, Pengcheng
    Song, Yan
    Jiang, Yiheng
    Dai, Lirong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 513 - 518
  • [26] Emotion Attribute Projection for Speaker Recognition on Emotional Speech
    Bao, Huanjun
    Xu, Mingxing
    Zheng, Thomas Fang
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 601 - 604
  • [27] Speaker independent speech emotion recognition by ensemble classification
    Schuller, B
    Reiter, S
    Müller, R
    Al-Hames, M
    Lang, M
    Rigoll, G
    2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 865 - 868
  • [28] Compensating for speaker or lexical variabilities in speech for emotion recognition
    Mariooryad, Soroosh
    Busso, Carlos
    SPEECH COMMUNICATION, 2014, 57 : 1 - 12
  • [29] AUTOMATED SPEECH RECOGNITION SYSTEM FOR SPEAKER EMOTION CLASSIFICATION
    Anithadevi, N.
    Gokul, P.
    Nandan, S. Muhil
    Magesh, R.
    Shiddharth, S.
    PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
  • [30] Multimodal Emotion Recognition Based on the Decoupling of Emotion and Speaker Information
    Gajsek, Rok
    Struc, Vitomir
    Mihelic, France
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 275 - 282