Speech Emotion Recognition Based on Attention MCNN Combined With Gender Information

被引：5

作者：

Hu, Zhangfang ^{[1
]}

LingHu, Kehuan ^{[1
]}

Yu, Hongling ^{[1
]}

Liao, Chenzhuo ^{[1
]}

机构：

[1] Chongqing Univ Posts & Telecommun CQUPT, Key Lab Optoelect Informat Sensing & Technol, Chongqing 400065, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

中国国家自然科学基金;

关键词：

Emotion recognition; Speech recognition; Mel frequency cepstral coefficient; Gender issues; Feature extraction; Convolutional neural networks; Three-dimensional displays; SER; convolutional neural network; gender information; attention; GRU; FEATURES;

D O I：

10.1109/ACCESS.2023.3278106

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emotion recognition is susceptible to interference such as feature redundancy and speaker gender differences, resulting in low recognition accuracy. This paper proposes a speech emotion recognition (SER) method based on attention mixed convolutional neural network (MCNN) combined with gender information, including two stages of gender recognition and emotion recognition. (1) Using MCNN to identify gender and classify speech samples into male and female. (2) According to the output of the first stage classification, a gender-specific emotion recognition model is established by introducing coordinated attention and a series of gated recurrent network units connecting the attention mechanism (A-GRUs) to achieve emotion recognition results of different genders. The inputs of both stages are dynamic 3D MFCC features generated from the original speech database. The proposed method achieves 95.02% and 86.34% accuracy on EMO-DB and RAVDESS datasets, respectively. The experimental results show that the proposed SER system combined with gender information significantly improves the recognition performance.

引用

页码：50285 / 50294

页数：10

共 50 条

[1] Speech Emotion Recognition via Sparse Learning-Based Fusion Model
Min, Dong-Jin
Kim, Deok-Hwan
IEEE ACCESS, 2024, 12 : 177219 - 177235
[2] CochleaSpecNet: An Attention-Based Dual Branch Hybrid CNN-GRU Network for Speech Emotion Recognition Using Cochleagram and Spectrogram
Namey, Atkia Anika
Akter, Khadija
Hossain, Md. Azad
Dewan, M. Ali Akber
IEEE ACCESS, 2024, 12 : 190760 - 190774
[3] Speech-Visual Emotion Recognition via Modal Decomposition Learning
Bai, Lei
Chang, Rui
Chen, Guanghui
Zhou, Yu
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1452 - 1456
[4] Speech Emotion Recognition Based on Robust Discriminative Sparse Regression
Song, Peng
Zheng, Wenming
Yu, Yanwei
Ou, Shifeng
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (02) : 343 - 353
[5] Exploration of an Independent Training Framework for Speech Emotion Recognition
Zhong, Shunming
Yu, Baoxian
Zhang, Han
IEEE ACCESS, 2020, 8 : 222533 - 222543
[6] ISNet: Individual Standardization Network for Speech Emotion Recognition
Fan, Weiquan
Xu, Xiangmin
Cai, Bolun
Xing, Xiaofen
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1803 - 1814
[7] Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features
Santoso, Jennifer
Yamada, Takeshi
Ishizuka, Kenkichi
Hashimoto, Taiichi
Makino, Shoji
IEEE ACCESS, 2022, 10 : 115732 - 115743
[8] Attention-Based Multi-Learning Approach for Speech Emotion Recognition With Dilated Convolution
Kakuba, Samuel
Poulose, Alwin
Han, Dong Seog
IEEE ACCESS, 2022, 10 : 122302 - 122313
[9] A Combined CNN Architecture for Speech Emotion Recognition
Begazo, Rolinson
Aguilera, Ana
Dongo, Irvin
Cardinale, Yudith
SENSORS, 2024, 24 (17)
[10] Enhancing Emotion Recognition in Speech Based on Self-Supervised Learning: Cross-Attention Fusion of Acoustic and Semantic Features
Deeb, Bashar M.
Savchenko, Andrey V.
Makarov, Ilya
IEEE ACCESS, 2025, 13 : 56283 - 56295

← 1 2 3 4 5 →