Speech Emotion Recognition Based on Attention MCNN Combined With Gender Information

被引：5

作者：

Hu, Zhangfang ^{[1
]}

LingHu, Kehuan ^{[1
]}

Yu, Hongling ^{[1
]}

Liao, Chenzhuo ^{[1
]}

机构：

[1] Chongqing Univ Posts & Telecommun CQUPT, Key Lab Optoelect Informat Sensing & Technol, Chongqing 400065, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

中国国家自然科学基金;

关键词：

Emotion recognition; Speech recognition; Mel frequency cepstral coefficient; Gender issues; Feature extraction; Convolutional neural networks; Three-dimensional displays; SER; convolutional neural network; gender information; attention; GRU; FEATURES;

D O I：

10.1109/ACCESS.2023.3278106

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emotion recognition is susceptible to interference such as feature redundancy and speaker gender differences, resulting in low recognition accuracy. This paper proposes a speech emotion recognition (SER) method based on attention mixed convolutional neural network (MCNN) combined with gender information, including two stages of gender recognition and emotion recognition. (1) Using MCNN to identify gender and classify speech samples into male and female. (2) According to the output of the first stage classification, a gender-specific emotion recognition model is established by introducing coordinated attention and a series of gated recurrent network units connecting the attention mechanism (A-GRUs) to achieve emotion recognition results of different genders. The inputs of both stages are dynamic 3D MFCC features generated from the original speech database. The proposed method achieves 95.02% and 86.34% accuracy on EMO-DB and RAVDESS datasets, respectively. The experimental results show that the proposed SER system combined with gender information significantly improves the recognition performance.

引用

页码：50285 / 50294

页数：10

共 50 条

[41] Gender Specific Emotion Recognition Through Speech Signals
Vinay
Gupta, Shilpi
Mehra, Anu
2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 727 - 733
[42] Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
Jo, A-Hyeon
Kwak, Keun-Chang
IEEE ACCESS, 2025, 13 : 19947 - 19963
[43] AMDET: Attention Based Multiple Dimensions EEG Transformer for Emotion Recognition
Xu, Yongling
Du, Yang
Li, Ling
Lai, Honghao
Zou, Jing
Zhou, Tianying
Xiao, Lushan
Liu, Li
Ma, Pengcheng
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1067 - 1077
[44] Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network
Baruah, Murchana
Banerjee, Bonny
INTERSPEECH 2022, 2022, : 4710 - 4714
[45] Efficient Feature-Aware Hybrid Model of Deep Learning Architectures for Speech Emotion Recognition
Ezz-Eldin, Mai
Khalaf, Ashraf A. M.
Hamed, Hesham F. A.
Hussein, Aziza, I
IEEE ACCESS, 2021, 9 : 19999 - 20011
[46] An End-To-End Emotion Recognition Framework Based on Temporal Aggregation of Multimodal Information
Radoi, Anamaria
Birhala, Andreea
Ristea, Nicolae-Catalin
Dutu, Liviu-Cristian
IEEE ACCESS, 2021, 9 : 135559 - 135570
[47] Effective Exploitation of Posterior Information for Attention-Based Speech Recognition
Tang, Jian
Hou, Junfeng
Song, Yan
Dai, Li-Rong
McLoughlin, Ian
IEEE ACCESS, 2020, 8 (08): : 108988 - 108999
[48] Speech Based Human Emotion Recognition Using MFCC
Likitha, M. S.
Gupta, Raksha R.
Hasitha, K.
Raju, A. Upendra
2017 2ND IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2017, : 2257 - 2260
[49] CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION
Ramet, Gaetan
Garner, Philip N.
Baeriswyl, Michael
Lazaridis, Alexandros
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 126 - 131
[50] SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
Xu, Mingke
Zhang, Fan
Cui, Xiaodong
Zhang, Wei
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6319 - 6323

← 1 2 3 4 5 →