Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information

被引：1

作者：

Guo, Lili ^{[1
]}

Wang, Longbiao ^{[1
]}

Dang, Jianwu ^{[1
,2
]}

Liu, Zhilei ^{[1
]}

Guan, Haotian ^{[3
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China

[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan

[3] Huiyan Technol Tianjin Co Ltd, Tianjin, Peoples R China

来源：

MULTIMEDIA MODELING (MMM 2020), PT I | 2020年 / 11961卷

基金：

中国国家自然科学基金;

关键词：

Speech emotion recognition; Amplitude spectrogram; Phase information; Modified group delay; Speaker information; CLASSIFICATION; FEATURES;

D O I：

10.1007/978-3-030-37731-1_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The use of a convolutional neural network (CNN) for extracting deep acoustic features from spectrograms has become one of the most commonly used methods for speech emotion recognition. In those studies, however, common amplitude information is chosen as input with no special attention to phase-related or speaker-related information. In this paper, we propose a multi-channel method employing amplitude and phase channels for speech emotion recognition. Two separate CNN channels are adopted to extract deep features from amplitude spectrograms and modified group delay (MGD) spectrograms. Then a concatenate layer is used to fuse the features. Furthermore, to gain more robust features, speaker information is considered in the stage of emotional feature extraction. Finally, the fusion features that considering speaker-related information are fed into the extreme learning machine (ELM) to distinguish emotions. Experiments are conducted on the Emo-DB database to evaluate the proposed model. Results demonstrate the recognition performance of average F1 in 94.82%, which significantly outperforms the baseline CNN-ELM model based on amplitude only spectrograms by 39.27% relative error reduction.

引用

页码：14 / 25

页数：12

共 50 条

[41] Graph Learning Based Speaker Independent Speech Emotion Recognition
Xu, Xinzhou
Huang, Chengwei
Wu, Chen
Wang, Qingyun
Zhao, Li
ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (02) : 17 - 22
[42] RobinNet: A Multimodal Speech Emotion Recognition System With Speaker Recognition for Social Interactions
Khurana, Yash
Gupta, Swamita
Sathyaraj, R.
Raja, S. P.
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 11 (01) : 478 - 487
[43] Speech-Visual Emotion Recognition by Fusing Shared and Specific Features
Chen, Guanghui
Jiao, Shuang
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 678 - 682
[44] Human emotion recognition by optimally fusing facial expression and speech feature
Wang, Xusheng
Chen, Xing
Cao, Congjun
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 84
[45] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
Md Shah Fahad
Ashish Ranjan
Akshay Deepak
Gayadhar Pradhan
Circuits, Systems, and Signal Processing, 2022, 41 : 6113 - 6135
[46] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
Fahad, Md Shah
Ranjan, Ashish
Deepak, Akshay
Pradhan, Gayadhar
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (11) : 6113 - 6135
[47] Exploitation of Phase Information for Speaker Recognition
Wang, Ning
Ching, P. C.
Lee, Tan
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2126 - 2129
[48] Information access using speech, speaker and face recognition
Viswanathan, M
Beigi, HSM
Tritschler, A
Maali, F
2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 493 - 496
[49] SPEECH EMOTION RECOGNITION USING SEMANTIC INFORMATION
Tzirakis, Panagiotis
Anh Nguyen
Zafeiriou, Stefanos
Schuller, Bjoern W.
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6279 - 6283
[50] A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition
Pao, Tsang-Long
Wang, Chun-Hsiang
Li, Yu-Ji
2012 FIFTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2012, : 157 - 162

← 1 2 3 4 5 →