Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information

被引:1
|
作者
Guo, Lili [1 ]
Wang, Longbiao [1 ]
Dang, Jianwu [1 ,2 ]
Liu, Zhilei [1 ]
Guan, Haotian [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
[3] Huiyan Technol Tianjin Co Ltd, Tianjin, Peoples R China
来源
MULTIMEDIA MODELING (MMM 2020), PT I | 2020年 / 11961卷
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; Amplitude spectrogram; Phase information; Modified group delay; Speaker information; CLASSIFICATION; FEATURES;
D O I
10.1007/978-3-030-37731-1_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of a convolutional neural network (CNN) for extracting deep acoustic features from spectrograms has become one of the most commonly used methods for speech emotion recognition. In those studies, however, common amplitude information is chosen as input with no special attention to phase-related or speaker-related information. In this paper, we propose a multi-channel method employing amplitude and phase channels for speech emotion recognition. Two separate CNN channels are adopted to extract deep features from amplitude spectrograms and modified group delay (MGD) spectrograms. Then a concatenate layer is used to fuse the features. Furthermore, to gain more robust features, speaker information is considered in the stage of emotional feature extraction. Finally, the fusion features that considering speaker-related information are fed into the extreme learning machine (ELM) to distinguish emotions. Experiments are conducted on the Emo-DB database to evaluate the proposed model. Results demonstrate the recognition performance of average F1 in 94.82%, which significantly outperforms the baseline CNN-ELM model based on amplitude only spectrograms by 39.27% relative error reduction.
引用
收藏
页码:14 / 25
页数:12
相关论文
共 50 条
  • [41] Graph Learning Based Speaker Independent Speech Emotion Recognition
    Xu, Xinzhou
    Huang, Chengwei
    Wu, Chen
    Wang, Qingyun
    Zhao, Li
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (02) : 17 - 22
  • [42] RobinNet: A Multimodal Speech Emotion Recognition System With Speaker Recognition for Social Interactions
    Khurana, Yash
    Gupta, Swamita
    Sathyaraj, R.
    Raja, S. P.
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 11 (01) : 478 - 487
  • [43] Speech-Visual Emotion Recognition by Fusing Shared and Specific Features
    Chen, Guanghui
    Jiao, Shuang
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 678 - 682
  • [44] Human emotion recognition by optimally fusing facial expression and speech feature
    Wang, Xusheng
    Chen, Xing
    Cao, Congjun
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 84
  • [45] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
    Md Shah Fahad
    Ashish Ranjan
    Akshay Deepak
    Gayadhar Pradhan
    Circuits, Systems, and Signal Processing, 2022, 41 : 6113 - 6135
  • [46] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
    Fahad, Md Shah
    Ranjan, Ashish
    Deepak, Akshay
    Pradhan, Gayadhar
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (11) : 6113 - 6135
  • [47] Exploitation of Phase Information for Speaker Recognition
    Wang, Ning
    Ching, P. C.
    Lee, Tan
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2126 - 2129
  • [48] Information access using speech, speaker and face recognition
    Viswanathan, M
    Beigi, HSM
    Tritschler, A
    Maali, F
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 493 - 496
  • [49] SPEECH EMOTION RECOGNITION USING SEMANTIC INFORMATION
    Tzirakis, Panagiotis
    Anh Nguyen
    Zafeiriou, Stefanos
    Schuller, Bjoern W.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6279 - 6283
  • [50] A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition
    Pao, Tsang-Long
    Wang, Chun-Hsiang
    Li, Yu-Ji
    2012 FIFTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2012, : 157 - 162