TWACapsNet: a capsule network with two-way attention mechanism for speech emotion recognition

被引:0
作者
Wen, Xin-Cheng [1 ]
Liu, Kun-Hong [2 ]
Luo, Yan [3 ]
Ye, Jiaxin [4 ]
Chen, Liyan [2 ]
机构
[1] Harbin Inst Technol Shenzhen, Dept Comp Sci, Shenzhen, Peoples R China
[2] Xiamen Univ, Sch Film, Xiamen, Peoples R China
[3] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
[4] Fudan Univ, Inst Sci & Technol Brain inspired Intelligence, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; Attention mechanism; Neural networks; FEATURES; CNN;
D O I
10.1007/s00500-023-08957-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech Emotion Recognition (SER) is a challenging task, and the typical convolutional neural network (CNN) cannot well handle the speech data directly. Because CNN tends to understand local information and ignores the overall characteristics. This paper proposes a Capsule Network with Two-Way Attention MechanismTWACapsNet for short) for the SER problem. TWACapsNet accepts the spatial and spectral features as inputs, and the convolutional layer and the capsule layer are deployed to process these two types of features in two ways separately. After that, two attention mechanisms are designed to enhance the information obtained from the spatial and spectral features. Finally, the results of these two ways are combined to form the final decision. The advantage of TWACapsNet is verified by experiments on multiple SER data sets, and experimental results show that the proposed method outperforms the widely-deployed neural network models on three typical SER data sets. Furthermore, the combination of the two ways contributes to the higher and more stable performance of TWACapsNet.
引用
收藏
页码:8701 / 8713
页数:13
相关论文
共 50 条
  • [41] Pyramid Memory Block and Timestep Attention for Speech Emotion Recognition
    Gao, Miao
    Yang, Chun
    Zhou, Fang
    Yin, Xu-cheng
    INTERSPEECH 2019, 2019, : 3930 - 3934
  • [42] Improve Accuracy of Speech Emotion Recognition with Attention Head Fusion
    Xu, Mingke
    Zhang, Fan
    Khan, Samee U.
    2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 1058 - 1064
  • [43] Emotion Recognition via Multiscale Feature Fusion Network and Attention Mechanism
    Jiang, Yiye
    Xie, Songyun
    Xie, Xinzhou
    Cui, Yujie
    Tang, Hao
    IEEE SENSORS JOURNAL, 2023, 23 (10) : 10790 - 10800
  • [44] End-to-End Speech Emotion Recognition Based on Neural Network
    Zhu, Bing
    Zhou, Wenkai
    Wang, Yutian
    Wang, Hui
    Cai, Juan Juan
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
  • [45] Speech emotion recognition using the novel PEmoNet (Parallel Emotion Network)
    Bhangale, Kishor B.
    Kothandaraman, Mohanaprasad
    APPLIED ACOUSTICS, 2023, 212
  • [46] Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network
    Baruah, Murchana
    Banerjee, Bonny
    INTERSPEECH 2022, 2022, : 4710 - 4714
  • [47] A bimodal network based on Audio-Text-Interactional-Attention with ArcFace loss for speech emotion recognition
    Tang, Yuwu
    Hu, Ying
    He, Liang
    Huang, Hao
    SPEECH COMMUNICATION, 2022, 143 : 21 - 32
  • [48] Multi-task coordinate attention gating network for speech emotion recognition under noisy circumstances
    Sun, Linhui
    Lei, Yunlong
    Zhang, Zixiao
    Tang, Yi
    Wang, Jing
    Ye, Lei
    Li, Pingan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 107
  • [49] ISNet: Individual Standardization Network for Speech Emotion Recognition
    Fan, Weiquan
    Xu, Xiangmin
    Cai, Bolun
    Xing, Xiaofen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1803 - 1814
  • [50] STUDY OF DENSE NETWORK APPROACHES FOR SPEECH EMOTION RECOGNITION
    Abdelwahab, Mohammed
    Busso, Carlos
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5084 - 5088