Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions

被引:0
|
作者
Yang Liu
Haoqin Sun
Wenbo Guan
Yuqi Xia
Zhen Zhao
机构
[1] Qingdao University of Science and Technology,School of Information Science and Technology
来源
Machine Intelligence Research | 2023年 / 20卷
关键词
Speech emotion recognition (SER); 3-dimensional (3D) feature; cascaded attention network (CAN); triplet loss; joint loss;
D O I
暂无
中图分类号
学科分类号
摘要
Due to the complexity of emotional expression, recognizing emotions from the speech is a critical and challenging task. In most of the studies, some specific emotions are easily classified incorrectly. In this paper, we propose a new framework that integrates cascade attention mechanism and joint loss for speech emotion recognition (SER), aiming to solve feature confusions for emotions that are difficult to be classified correctly. First, we extract the mel frequency cepstrum coefficients (MFCCs), deltas, and delta-deltas from MFCCs to form 3-dimensional (3D) features, thus effectively reducing the interference of external factors. Second, we employ spatiotemporal attention to selectively discover target emotion regions from the input features, where self-attention with head fusion captures the long-range dependency of temporal features. Finally, the joint loss function is employed to distinguish emotional embeddings with high similarity to enhance the overall performance. Experiments on interactive emotional dyadic motion capture (IEMOCAP) database indicate that the method achieves a positive improvement of 2.49% and 1.13% in weighted accuracy (WA) and unweighted accuracy (UA), respectively, compared to the state-of-the-art strategies.
引用
收藏
页码:595 / 604
页数:9
相关论文
共 50 条
  • [1] Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Zhao, Zhen
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (04) : 595 - 604
  • [2] Discriminative Feature Representation Based on Cascaded Attention Network with Adversarial Joint Loss for Speech Emotion Recognition
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Zhao, Zhen
    INTERSPEECH 2022, 2022, : 4750 - 4754
  • [3] A Speech Emotion Recognition Framework for Better Discrimination of Confusions
    Liu, Jiawang
    Wang, Haoxiang
    INTERSPEECH 2021, 2021, : 4483 - 4487
  • [4] A Joint Network Based on Interactive Attention for Speech Emotion Recognition
    Hu, Ying
    Hou, Shijing
    Yang, Huamin
    Huang, Hao
    He, Liang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1715 - 1720
  • [5] A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition
    Liu, Yang
    Xia, Yuqi
    Sun, Haoqin
    Meng, Xiaolei
    Bai, Jianxiong
    Guan, Wenbo
    Zhao, Zhen
    LI, Yongwei
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2023, E106A (06) : 876 - 885
  • [6] Spatiotemporal and frequential cascaded attention networks for speech emotion recognition
    Li, Shuzhen
    Xing, Xiaofen
    Fan, Weiquan
    Cai, Bolun
    Fordson, Perry
    Xu, Xiangmin
    NEUROCOMPUTING, 2021, 448 : 238 - 248
  • [7] Spatiotemporal and frequential cascaded attention networks for speech emotion recognition
    Li, Shuzhen
    Xing, Xiaofen
    Fan, Weiquan
    Cai, Bolun
    Fordson, Perry
    Xu, Xiangmin
    Neurocomputing, 2021, 448 : 238 - 248
  • [8] Discriminative feature learning based on multi-view attention network with diffusion joint loss for speech emotion recognition
    Liu, Yang
    Chen, Xin
    Song, Yuan
    Li, Yarong
    Wang, Shengbei
    Yuan, Weitao
    Li, Yongwei
    Zhao, Zhen
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [9] A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Li, Yongwei
    Unoki, Masashi
    Zhao, Zhen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1063 - 1074
  • [10] Attention Based Fully Convolutional Network for Speech Emotion Recognition
    Zhang, Yuanyuan
    Du, Jun
    Wang, Zirui
    Zhang, Jianshu
    Tu, Yanhui
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1771 - 1775