Self-attention transfer networks for speech emotion recognition

被引:3
|
作者
Ziping ZHAO [1 ]
Keru Wang [1 ]
Zhongtian BAO [1 ]
Zixing ZHANG [2 ]
Nicholas CUMMINS [3 ,4 ]
Shihuang SUN [5 ]
Haishuai WANG [5 ]
Jianhua TAO [6 ]
Bj?rn W.SCHULLER [1 ,2 ,3 ]
机构
[1] College of Computer and Information Engineering, Tianjin Normal University
[2] GLAM-Group on Language, Audio & Music, Imperial College London
[3] Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg
[4] Department of Biostatistics and Health Informatics, Io PPN, King's College London
[5] Department of Computer Science and Engineering, Fairfield University
[6] National Laboratory of Pattern Recognition,CASIA
基金
欧盟地平线“2020”; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TN912.34 [语音识别与设备];
学科分类号
摘要
Background A crucial element of human-machine interaction, the automatic detection of emotional states from human speech has long been regarded as a challenging task for machine learning models. One vital challenge in speech emotion recognition(SER) is learning robust and discriminative representations from speech. Although machine learning methods have been widely applied in SER research, the inadequate amount of available annotated data has become a bottleneck impeding the extended application of such techniques(e. g., deep neural networks). To address this issue, we present a deep learning method that combines knowledge transfer and self-attention for SER tasks. Herein, we apply the log-Mel spectrogram with deltas and delta-deltas as inputs. Moreover, given that emotions are timedependent, we apply temporal convolutional neural networks to model the variations in emotions. We further introduce an attention transfer mechanism, which is based on a self-attention algorithm to learn long-term dependencies. The self-attention transfer network(SATN) in our proposed approach takes advantage of attention transfer to learn attention from speech recognition, followed by transferring this knowledge into SER. An evaluation built on Interactive Emotional Dyadic Motion Capture(IEMOCAP)dataset demonstrates the effectiveness of the proposed model.
引用
收藏
页码:43 / 54
页数:12
相关论文
共 50 条
  • [1] Self-attention for Speech Emotion Recognition
    Tarantino, Lorenzo
    Garner, Philip N.
    Lazaridis, Alexandros
    INTERSPEECH 2019, 2019, : 2578 - 2582
  • [2] Speech emotion recognition using recurrent neural networks with directional self-attention
    Li, Dongdong
    Liu, Jinlin
    Yang, Zhuo
    Sun, Linyu
    Wang, Zhe
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 173
  • [3] Combining Gated Convolutional Networks and Self-Attention Mechanism for Speech Emotion Recognition
    Li, Chao
    Jiao, Jinlong
    Zhao, Yiqin
    Zhao, Ziping
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 105 - 109
  • [4] BAT: Block and token self-attention for speech emotion recognition
    Lei, Jianjun
    Zhu, Xiangwei
    Wang, Ying
    Neural Networks, 2022, 156 : 67 - 80
  • [5] BAT: Block and token self-attention for speech emotion recognition
    Lei, Jianjun
    Zhu, Xiangwei
    Wang, Ying
    NEURAL NETWORKS, 2022, 156 : 67 - 80
  • [6] NEPALI SPEECH RECOGNITION USING SELF-ATTENTION NETWORKS
    Joshi, Basanta
    Shrestha, Rupesh
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2023, 19 (06): : 1769 - 1784
  • [7] MULTIMODAL CROSS- AND SELF-ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION
    Sun, Licai
    Liu, Bin
    Tao, Jianhua
    Lian, Zheng
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4275 - 4279
  • [8] SELF-ATTENTION NETWORKS FOR CONNECTIONIST TEMPORAL CLASSIFICATION IN SPEECH RECOGNITION
    Salazar, Julian
    Kirchhoff, Katrin
    Huang, Zhiheng
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7115 - 7119
  • [9] Enhancing speech emotion recognition: a deep learning approach with self-attention and acoustic features
    Aghajani, Khadijeh
    Zohrevandi, Mahbanou
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (05):
  • [10] Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks
    Lian, Zheng
    Tao, Jianhua
    Liu, Bin
    Huang, Jian
    Yang, Zhanlei
    Li, Rongjun
    INTERSPEECH 2020, 2020, : 2347 - 2351