Self-attention transfer networks for speech emotion recognition

被引:3
|
作者
Ziping ZHAO [1 ]
Keru Wang [1 ]
Zhongtian BAO [1 ]
Zixing ZHANG [2 ]
Nicholas CUMMINS [3 ,4 ]
Shihuang SUN [5 ]
Haishuai WANG [5 ]
Jianhua TAO [6 ]
Bj?rn W.SCHULLER [1 ,2 ,3 ]
机构
[1] College of Computer and Information Engineering, Tianjin Normal University
[2] GLAM-Group on Language, Audio & Music, Imperial College London
[3] Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg
[4] Department of Biostatistics and Health Informatics, Io PPN, King's College London
[5] Department of Computer Science and Engineering, Fairfield University
[6] National Laboratory of Pattern Recognition,CASIA
基金
欧盟地平线“2020”; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TN912.34 [语音识别与设备];
学科分类号
摘要
Background A crucial element of human-machine interaction, the automatic detection of emotional states from human speech has long been regarded as a challenging task for machine learning models. One vital challenge in speech emotion recognition(SER) is learning robust and discriminative representations from speech. Although machine learning methods have been widely applied in SER research, the inadequate amount of available annotated data has become a bottleneck impeding the extended application of such techniques(e. g., deep neural networks). To address this issue, we present a deep learning method that combines knowledge transfer and self-attention for SER tasks. Herein, we apply the log-Mel spectrogram with deltas and delta-deltas as inputs. Moreover, given that emotions are timedependent, we apply temporal convolutional neural networks to model the variations in emotions. We further introduce an attention transfer mechanism, which is based on a self-attention algorithm to learn long-term dependencies. The self-attention transfer network(SATN) in our proposed approach takes advantage of attention transfer to learn attention from speech recognition, followed by transferring this knowledge into SER. An evaluation built on Interactive Emotional Dyadic Motion Capture(IEMOCAP)dataset demonstrates the effectiveness of the proposed model.
引用
收藏
页码:43 / 54
页数:12
相关论文
共 50 条
  • [41] Self-Attention Networks for Human Activity Recognition Using Wearable Devices
    Betancourt, Carlos
    Chen, Wen-Hui
    Kuan, Chi-Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1194 - 1199
  • [42] Self-Attention Networks For Motion Posture Recognition Based On Data Fusion
    Ji, Zhihao
    Xie, Qiang
    4TH INTERNATIONAL CONFERENCE ON INFORMATICS ENGINEERING AND INFORMATION SCIENCE (ICIEIS2021), 2022, 12161
  • [43] Self-attention Networks for Non-recurrent Handwritten Text Recognition
    d'Arce, Rafael
    Norton, Terence
    Hannuna, Sion
    Cristianini, Nello
    FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 389 - 403
  • [44] Age-Unbiased Facial Emotion Recognition with Regularizing Self-Attention Value Vector
    Park, Jaeil
    Cho, Sung-Bae
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT I, 2025, 15346 : 472 - 480
  • [45] Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks
    Song, Xingchen
    Wang, Guangsen
    Huang, Yiheng
    Wu, Zhiyong
    Su, Dan
    Meng, Helen
    INTERSPEECH 2020, 2020, : 3765 - 3769
  • [46] SPEECH DENOISING IN THE WAVEFORM DOMAIN WITH SELF-ATTENTION
    Kong, Zhifeng
    Ping, Wei
    Dantrey, Ambrish
    Catanzaro, Bryan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7867 - 7871
  • [47] Exploring Self-Attention Mechanisms for Speech Separation
    Subakan, Cem
    Ravanelli, Mirco
    Cornell, Samuele
    Grondin, Francois
    Bronzi, Mirko
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2169 - 2180
  • [48] Self-Attention Generative Adversarial Networks
    Zhang, Han
    Goodfellow, Ian
    Metaxas, Dimitris
    Odena, Augustus
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [49] AUTOMATIC SPEECH EMOTION RECOGNITION USING RECURRENT NEURAL NETWORKS WITH LOCAL ATTENTION
    Mirsamadi, Seyedmahdad
    Barsoum, Emad
    Zhang, Cha
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2227 - 2231
  • [50] Hierarchical convolutional neural networks with post-attention for speech emotion recognition
    Fan, Yonghong
    Huang, Heming
    Han, Henry
    NEUROCOMPUTING, 2025, 615