Speech Emotion Recognition Based on Attention MCNN Combined With Gender Information

被引:5
|
作者
Hu, Zhangfang [1 ]
LingHu, Kehuan [1 ]
Yu, Hongling [1 ]
Liao, Chenzhuo [1 ]
机构
[1] Chongqing Univ Posts & Telecommun CQUPT, Key Lab Optoelect Informat Sensing & Technol, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Speech recognition; Mel frequency cepstral coefficient; Gender issues; Feature extraction; Convolutional neural networks; Three-dimensional displays; SER; convolutional neural network; gender information; attention; GRU; FEATURES;
D O I
10.1109/ACCESS.2023.3278106
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition is susceptible to interference such as feature redundancy and speaker gender differences, resulting in low recognition accuracy. This paper proposes a speech emotion recognition (SER) method based on attention mixed convolutional neural network (MCNN) combined with gender information, including two stages of gender recognition and emotion recognition. (1) Using MCNN to identify gender and classify speech samples into male and female. (2) According to the output of the first stage classification, a gender-specific emotion recognition model is established by introducing coordinated attention and a series of gated recurrent network units connecting the attention mechanism (A-GRUs) to achieve emotion recognition results of different genders. The inputs of both stages are dynamic 3D MFCC features generated from the original speech database. The proposed method achieves 95.02% and 86.34% accuracy on EMO-DB and RAVDESS datasets, respectively. The experimental results show that the proposed SER system combined with gender information significantly improves the recognition performance.
引用
收藏
页码:50285 / 50294
页数:10
相关论文
共 50 条
  • [41] Gender Specific Emotion Recognition Through Speech Signals
    Vinay
    Gupta, Shilpi
    Mehra, Anu
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 727 - 733
  • [42] Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data
    Jo, A-Hyeon
    Kwak, Keun-Chang
    IEEE ACCESS, 2025, 13 : 19947 - 19963
  • [43] AMDET: Attention Based Multiple Dimensions EEG Transformer for Emotion Recognition
    Xu, Yongling
    Du, Yang
    Li, Ling
    Lai, Honghao
    Zou, Jing
    Zhou, Tianying
    Xiao, Lushan
    Liu, Li
    Ma, Pengcheng
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1067 - 1077
  • [44] Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network
    Baruah, Murchana
    Banerjee, Bonny
    INTERSPEECH 2022, 2022, : 4710 - 4714
  • [45] Efficient Feature-Aware Hybrid Model of Deep Learning Architectures for Speech Emotion Recognition
    Ezz-Eldin, Mai
    Khalaf, Ashraf A. M.
    Hamed, Hesham F. A.
    Hussein, Aziza, I
    IEEE ACCESS, 2021, 9 : 19999 - 20011
  • [46] An End-To-End Emotion Recognition Framework Based on Temporal Aggregation of Multimodal Information
    Radoi, Anamaria
    Birhala, Andreea
    Ristea, Nicolae-Catalin
    Dutu, Liviu-Cristian
    IEEE ACCESS, 2021, 9 : 135559 - 135570
  • [47] Effective Exploitation of Posterior Information for Attention-Based Speech Recognition
    Tang, Jian
    Hou, Junfeng
    Song, Yan
    Dai, Li-Rong
    McLoughlin, Ian
    IEEE ACCESS, 2020, 8 (08): : 108988 - 108999
  • [48] Speech Based Human Emotion Recognition Using MFCC
    Likitha, M. S.
    Gupta, Raksha R.
    Hasitha, K.
    Raju, A. Upendra
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2017, : 2257 - 2260
  • [49] CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION
    Ramet, Gaetan
    Garner, Philip N.
    Baeriswyl, Michael
    Lazaridis, Alexandros
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 126 - 131
  • [50] SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
    Xu, Mingke
    Zhang, Fan
    Cui, Xiaodong
    Zhang, Wei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6319 - 6323