Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information

被引：1

作者：

Guo, Lili ^{[1
]}

Wang, Longbiao ^{[1
]}

Dang, Jianwu ^{[1
,2
]}

Liu, Zhilei ^{[1
]}

Guan, Haotian ^{[3
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China

[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan

[3] Huiyan Technol Tianjin Co Ltd, Tianjin, Peoples R China

来源：

MULTIMEDIA MODELING (MMM 2020), PT I | 2020年 / 11961卷

基金：

中国国家自然科学基金;

关键词：

Speech emotion recognition; Amplitude spectrogram; Phase information; Modified group delay; Speaker information; CLASSIFICATION; FEATURES;

D O I：

10.1007/978-3-030-37731-1_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The use of a convolutional neural network (CNN) for extracting deep acoustic features from spectrograms has become one of the most commonly used methods for speech emotion recognition. In those studies, however, common amplitude information is chosen as input with no special attention to phase-related or speaker-related information. In this paper, we propose a multi-channel method employing amplitude and phase channels for speech emotion recognition. Two separate CNN channels are adopted to extract deep features from amplitude spectrograms and modified group delay (MGD) spectrograms. Then a concatenate layer is used to fuse the features. Furthermore, to gain more robust features, speaker information is considered in the stage of emotional feature extraction. Finally, the fusion features that considering speaker-related information are fed into the extreme learning machine (ELM) to distinguish emotions. Experiments are conducted on the Emo-DB database to evaluate the proposed model. Results demonstrate the recognition performance of average F1 in 94.82%, which significantly outperforms the baseline CNN-ELM model based on amplitude only spectrograms by 39.27% relative error reduction.

引用

页码：14 / 25

页数：12

共 50 条

[1] SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation
Lim, Seunguook
Kim, Jihie
ALGORITHMS, 2023, 16 (01)
[2] Speaker-Aware Interactive Graph Attention Network for Emotion Recognition in Conversation
Jia, Zhaohong
Shi, Yunwei
Liu, Weifeng
Huang, Zhenhua
Sun, Xiao
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (12)
[3] SPEAKER-AWARE SPEECH-TRANSFORMER
Fan, Zhiyun
Li, Jie
Zhou, Shiyu
Xu, Bo
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 222 - 229
[4] Speaker-Aware Monaural Speech Separation
Xu, Jiahao
Hu, Kun
Xu, Chang
Duc Chung Tran
Wang, Zhiyong
INTERSPEECH 2020, 2020, : 1451 - 1455
[5] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
Pironkov, Gueorgui
Dupont, Stephane
Dutoit, Thierry
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
[6] Speaker-aware Cross-modal Fusion Architecture for Conversational Emotion Recognition
Zhao, Huan
Li, Bo
Zhang, Zixing
INTERSPEECH 2023, 2023, : 2718 - 2722
[7] Speaker-Aware Speech Enhancement with Self-Attention
Lin, Ju
Van Wijngaarden, Adriaan J.
Smith, Melissa C.
Wang, Kuang-Ching
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 486 - 490
[8] Speaker-aware cognitive network with cross-modal attention for multimodal emotion recognition in conversation
Guo, Lili
Song, Yikang
Ding, Shifei
KNOWLEDGE-BASED SYSTEMS, 2024, 296
[9] Speaker-aware neural network based beamformer for speaker extraction in speech mixtures
Zmplikova, Katerina
Delcroix, Marc
Kinoshita, Keisuke
Higuchi, Takuya
Ogawa, Atsunori
Nakatani, Tomohiro
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2655 - 2659
[10] Speaker-aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement
Chuang, Fu-Kai
Wang, Syu-Siang
Hung, Jeih-weih
Tsao, Yu
Fang, Shih-Hau
INTERSPEECH 2019, 2019, : 3173 - 3177

← 1 2 3 4 5 →