Self-attention transfer networks for speech emotion recognition

被引:3
|
作者
Ziping ZHAO [1 ]
Keru Wang [1 ]
Zhongtian BAO [1 ]
Zixing ZHANG [2 ]
Nicholas CUMMINS [3 ,4 ]
Shihuang SUN [5 ]
Haishuai WANG [5 ]
Jianhua TAO [6 ]
Bj?rn W.SCHULLER [1 ,2 ,3 ]
机构
[1] College of Computer and Information Engineering, Tianjin Normal University
[2] GLAM-Group on Language, Audio & Music, Imperial College London
[3] Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg
[4] Department of Biostatistics and Health Informatics, Io PPN, King's College London
[5] Department of Computer Science and Engineering, Fairfield University
[6] National Laboratory of Pattern Recognition,CASIA
基金
欧盟地平线“2020”; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TN912.34 [语音识别与设备];
学科分类号
摘要
Background A crucial element of human-machine interaction, the automatic detection of emotional states from human speech has long been regarded as a challenging task for machine learning models. One vital challenge in speech emotion recognition(SER) is learning robust and discriminative representations from speech. Although machine learning methods have been widely applied in SER research, the inadequate amount of available annotated data has become a bottleneck impeding the extended application of such techniques(e. g., deep neural networks). To address this issue, we present a deep learning method that combines knowledge transfer and self-attention for SER tasks. Herein, we apply the log-Mel spectrogram with deltas and delta-deltas as inputs. Moreover, given that emotions are timedependent, we apply temporal convolutional neural networks to model the variations in emotions. We further introduce an attention transfer mechanism, which is based on a self-attention algorithm to learn long-term dependencies. The self-attention transfer network(SATN) in our proposed approach takes advantage of attention transfer to learn attention from speech recognition, followed by transferring this knowledge into SER. An evaluation built on Interactive Emotional Dyadic Motion Capture(IEMOCAP)dataset demonstrates the effectiveness of the proposed model.
引用
收藏
页码:43 / 54
页数:12
相关论文
共 50 条
  • [31] Self-attention Bi-RNN for developer emotion recognition based on EEG
    Wang, Yingdong
    Zheng, Yuhui
    Cao, Lu
    Zhang, Zhiling
    Ruan, Qunsehng
    Wu, Qingfeng
    IET SOFTWARE, 2022,
  • [32] Attention to Emotions: Body Emotion Recognition In-the-Wild Using Self-attention Transformer Network
    Paiva, Pedro V. V.
    Ramos, Josue J. G.
    Gavrilova, Marina
    Carvalho, Marco A. G.
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 206 - 228
  • [33] Self-attention Bi-RNN for developer emotion recognition based on EEG
    Wang, Yingdong
    Zheng, Yuhui
    Cao, Lu
    Zhang, Zhiling
    Ruan, Qunsehng
    Wu, Qingfeng
    IET SOFTWARE, 2023, 17 (04) : 620 - 631
  • [34] Spatial-frequency convolutional self-attention network for EEG emotion recognition
    Li, Dongdong
    Xie, Li
    Chai, Bing
    Wang, Zhe
    Yang, Hai
    APPLIED SOFT COMPUTING, 2022, 122
  • [35] Music Emotion Recognition Fusion on CNN-BiLSTM and Self-Attention Model
    Zhong, Zhipeng
    Wang, Hailong
    Su, Guibin
    Liu, Lin
    Pei, Dongmei
    Computer Engineering and Applications, 2024, 59 (03) : 94 - 103
  • [36] Convolutional Self-Attention Networks
    Yang, Baosong
    Wang, Longyue
    Wong, Derek F.
    Chao, Lidia S.
    Tu, Zhaopeng
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4040 - 4045
  • [37] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
    Mountzouris, Konstantinos
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    Corchado, Juan M.
    Iglesias, Carlos A.
    Kim, Byung-Gyu
    Mehmood, Rashid
    Ren, Fuji
    Lee, In
    ELECTRONICS, 2023, 12 (20)
  • [38] Combining Part-of-Speech Tags and Self-Attention Mechanism for Simile Recognition
    Zhang, Pengfei
    Cai, Yi
    Chen, Junying
    Chen, Wenhao
    Song, Hengjie
    IEEE ACCESS, 2019, 7 : 163864 - 163876
  • [39] SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
    Parcollet, Titouan
    van Dalen, Rogier
    Zhang, Shucong
    Bhattacharya, Sourav
    INTERSPEECH 2024, 2024, : 3460 - 3464
  • [40] Self-labeling with feature transfer for speech emotion recognition
    Wen, Guihua
    Liao, Huiqiang
    Li, Huihui
    Wen, Pengchen
    Zhang, Tong
    Gao, Sande
    Wang, Bao
    KNOWLEDGE-BASED SYSTEMS, 2022, 254