Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions

被引:0
|
作者
Yang Liu
Haoqin Sun
Wenbo Guan
Yuqi Xia
Zhen Zhao
机构
[1] Qingdao University of Science and Technology,School of Information Science and Technology
来源
Machine Intelligence Research | 2023年 / 20卷
关键词
Speech emotion recognition (SER); 3-dimensional (3D) feature; cascaded attention network (CAN); triplet loss; joint loss;
D O I
暂无
中图分类号
学科分类号
摘要
Due to the complexity of emotional expression, recognizing emotions from the speech is a critical and challenging task. In most of the studies, some specific emotions are easily classified incorrectly. In this paper, we propose a new framework that integrates cascade attention mechanism and joint loss for speech emotion recognition (SER), aiming to solve feature confusions for emotions that are difficult to be classified correctly. First, we extract the mel frequency cepstrum coefficients (MFCCs), deltas, and delta-deltas from MFCCs to form 3-dimensional (3D) features, thus effectively reducing the interference of external factors. Second, we employ spatiotemporal attention to selectively discover target emotion regions from the input features, where self-attention with head fusion captures the long-range dependency of temporal features. Finally, the joint loss function is employed to distinguish emotional embeddings with high similarity to enhance the overall performance. Experiments on interactive emotional dyadic motion capture (IEMOCAP) database indicate that the method achieves a positive improvement of 2.49% and 1.13% in weighted accuracy (WA) and unweighted accuracy (UA), respectively, compared to the state-of-the-art strategies.
引用
收藏
页码:595 / 604
页数:9
相关论文
共 50 条
  • [21] AN INTERACTION-AWARE ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION IN SPOKEN DIALOGS
    Yeh, Sung-Lin
    Lin, Yun-Shao
    Lee, Chi-Chun
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6685 - 6689
  • [22] A novel conversational hierarchical attention network for speech emotion recognition in dyadic conversation
    Tellai, Mohammed
    Gao, Lijian
    Mao, Qirong
    Abdelaziz, Mounir
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 59699 - 59723
  • [23] MULTIMODAL CROSS- AND SELF-ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION
    Sun, Licai
    Liu, Bin
    Tao, Jianhua
    Lian, Zheng
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4275 - 4279
  • [24] Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network
    Baruah, Murchana
    Banerjee, Bonny
    INTERSPEECH 2022, 2022, : 4710 - 4714
  • [25] The Impact of Attention Mechanisms on Speech Emotion Recognition
    Chen, Shouyan
    Zhang, Mingyan
    Yang, Xiaofen
    Zhao, Zhijia
    Zou, Tao
    Sun, Xinqi
    SENSORS, 2021, 21 (22)
  • [26] Self-attention for Speech Emotion Recognition
    Tarantino, Lorenzo
    Garner, Philip N.
    Lazaridis, Alexandros
    INTERSPEECH 2019, 2019, : 2578 - 2582
  • [27] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
    Mountzouris, Konstantinos
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    Corchado, Juan M.
    Iglesias, Carlos A.
    Kim, Byung-Gyu
    Mehmood, Rashid
    Ren, Fuji
    Lee, In
    ELECTRONICS, 2023, 12 (20)
  • [28] SPEECH EMOTION RECOGNITION USING MULTI-HOP ATTENTION MECHANISM
    Yoon, Seunghyun
    Byun, Seokhyun
    Dey, Subhadeep
    Jung, Kyomin
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2822 - 2826
  • [29] A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism
    Lieskovska, Eva
    Jakubec, Maros
    Jarina, Roman
    Chmulik, Michal
    ELECTRONICS, 2021, 10 (10)
  • [30] Coordination Attention based Transformers with bidirectional contrastive loss for multimodal speech emotion recognition
    Fan, Weiquan
    Xu, Xiangmin
    Zhou, Guohua
    Deng, Xiaofang
    Xing, Xiaofen
    SPEECH COMMUNICATION, 2025, 169