Speech Emotion Recognition Based on Attention MCNN Combined With Gender Information

被引:5
|
作者
Hu, Zhangfang [1 ]
LingHu, Kehuan [1 ]
Yu, Hongling [1 ]
Liao, Chenzhuo [1 ]
机构
[1] Chongqing Univ Posts & Telecommun CQUPT, Key Lab Optoelect Informat Sensing & Technol, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Speech recognition; Mel frequency cepstral coefficient; Gender issues; Feature extraction; Convolutional neural networks; Three-dimensional displays; SER; convolutional neural network; gender information; attention; GRU; FEATURES;
D O I
10.1109/ACCESS.2023.3278106
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition is susceptible to interference such as feature redundancy and speaker gender differences, resulting in low recognition accuracy. This paper proposes a speech emotion recognition (SER) method based on attention mixed convolutional neural network (MCNN) combined with gender information, including two stages of gender recognition and emotion recognition. (1) Using MCNN to identify gender and classify speech samples into male and female. (2) According to the output of the first stage classification, a gender-specific emotion recognition model is established by introducing coordinated attention and a series of gated recurrent network units connecting the attention mechanism (A-GRUs) to achieve emotion recognition results of different genders. The inputs of both stages are dynamic 3D MFCC features generated from the original speech database. The proposed method achieves 95.02% and 86.34% accuracy on EMO-DB and RAVDESS datasets, respectively. The experimental results show that the proposed SER system combined with gender information significantly improves the recognition performance.
引用
收藏
页码:50285 / 50294
页数:10
相关论文
共 50 条
  • [1] Speech Emotion Recognition via Sparse Learning-Based Fusion Model
    Min, Dong-Jin
    Kim, Deok-Hwan
    IEEE ACCESS, 2024, 12 : 177219 - 177235
  • [2] CochleaSpecNet: An Attention-Based Dual Branch Hybrid CNN-GRU Network for Speech Emotion Recognition Using Cochleagram and Spectrogram
    Namey, Atkia Anika
    Akter, Khadija
    Hossain, Md. Azad
    Dewan, M. Ali Akber
    IEEE ACCESS, 2024, 12 : 190760 - 190774
  • [3] Speech-Visual Emotion Recognition via Modal Decomposition Learning
    Bai, Lei
    Chang, Rui
    Chen, Guanghui
    Zhou, Yu
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1452 - 1456
  • [4] Speech Emotion Recognition Based on Robust Discriminative Sparse Regression
    Song, Peng
    Zheng, Wenming
    Yu, Yanwei
    Ou, Shifeng
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (02) : 343 - 353
  • [5] Exploration of an Independent Training Framework for Speech Emotion Recognition
    Zhong, Shunming
    Yu, Baoxian
    Zhang, Han
    IEEE ACCESS, 2020, 8 : 222533 - 222543
  • [6] ISNet: Individual Standardization Network for Speech Emotion Recognition
    Fan, Weiquan
    Xu, Xiangmin
    Cai, Bolun
    Xing, Xiaofen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1803 - 1814
  • [7] Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features
    Santoso, Jennifer
    Yamada, Takeshi
    Ishizuka, Kenkichi
    Hashimoto, Taiichi
    Makino, Shoji
    IEEE ACCESS, 2022, 10 : 115732 - 115743
  • [8] Attention-Based Multi-Learning Approach for Speech Emotion Recognition With Dilated Convolution
    Kakuba, Samuel
    Poulose, Alwin
    Han, Dong Seog
    IEEE ACCESS, 2022, 10 : 122302 - 122313
  • [9] A Combined CNN Architecture for Speech Emotion Recognition
    Begazo, Rolinson
    Aguilera, Ana
    Dongo, Irvin
    Cardinale, Yudith
    SENSORS, 2024, 24 (17)
  • [10] Enhancing Emotion Recognition in Speech Based on Self-Supervised Learning: Cross-Attention Fusion of Acoustic and Semantic Features
    Deeb, Bashar M.
    Savchenko, Andrey V.
    Makarov, Ilya
    IEEE ACCESS, 2025, 13 : 56283 - 56295