Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework

被引:27
|
作者
Liu, Yang [1 ]
Sun, Haoqin [1 ]
Guan, Wenbo [1 ]
Xia, Yuqi [1 ]
Zhao, Zhen [1 ]
机构
[1] Qingdao Univ Sci & Technol, Sch Informat Sci & Technol, Qingdao 266061, Peoples R China
关键词
Speech emotion recognition; Utterance-level contextual information; Multi-scale fusion framework; NEURAL-NETWORKS;
D O I
10.1016/j.specom.2022.02.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Accurately recognizing emotion from speech is a necessary yet challenging task due to the variability in speech and emotion. In this paper, a novel method combined self-attention mechanism and multi-scale fusion framework is proposed for multi-modal SER by using speech and text information. A self-attentional bidirectional contextual LSTM (bc-LSTM) is proposed to learn the context-sensitive dependences from speech. Specifically, the BLSTM layer is applied to learn long-term dependencies and utterance-level contextual information and the multi-head self-attention layer makes the model focus on the features that are most related to the emotions. A self-attentional multi-channel CNN (MCNN), which takes advantage of static and dynamic channels, is applied for learning general and thematic features from text. Finally, a multi-scale fusion strategy, including feature-level fusion and decision-level fusion, is applied to improve the overall performance. Experimental results on the benchmark dataset IEMOCAP demonstrate that our method gains an absolute improvement of 1.48% and 3.00% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [41] Speech Emotion Recognition Using Multi-Scale Global-Local Representation Learning with Feature Pyramid Network
    Wang, Yuhua
    Huang, Jianxing
    Zhao, Zhengdao
    Lan, Haiyan
    Zhang, Xinjia
    APPLIED SCIENCES-BASEL, 2024, 14 (24):
  • [42] MBDA: A Multi-scale Bidirectional Perception Approach for Cross-Corpus Speech Emotion Recognition
    Li, Jiayang
    Wang, Xiaoye
    Li, Siyuan
    Shi, Jia
    Xiao, Yingyuan
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 329 - 341
  • [43] Speech Emotion Recognition Based on Self-Attention Weight Correction for Acoustic and Text Features
    Santoso, Jennifer
    Yamada, Takeshi
    Ishizuka, Kenkichi
    Hashimoto, Taiichi
    Makino, Shoji
    IEEE ACCESS, 2022, 10 : 115732 - 115743
  • [44] Enhancing speech emotion recognition: a deep learning approach with self-attention and acoustic features
    Aghajani, Khadijeh
    Zohrevandi, Mahbanou
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (05)
  • [45] Recurrent multi-head attention fusion network for combining audio and text for speech emotion recognition
    Ahn, Chung-Soo
    Kasun, L. L. Chamara
    Sivadas, Sunil
    Rajapakse, Jagath C.
    INTERSPEECH 2022, 2022, : 744 - 748
  • [46] Speech emotion recognition based on multi-feature and multi-lingual fusion
    Wang, Chunyi
    Ren, Ying
    Zhang, Na
    Cui, Fuwei
    Luo, Shiying
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 4897 - 4907
  • [47] Semantic Enhancement Network Integrating Label Knowledge for Multi-modal Emotion Recognition
    Zheng, HongFeng
    Miao, ShengFa
    Yu, Qian
    Mu, YongKang
    Jin, Xin
    Yan, KeShan
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14879 : 473 - 484
  • [48] Speech Emotion Recognition via Multi-Level Attention Network
    Liu, Ke
    Wang, Dekui
    Wu, Dongya
    Liu, Yutao
    Feng, Jun
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2278 - 2282
  • [49] EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition
    Gerczuk, Maurice
    Amiriparian, Shahin
    Ottl, Sandra
    Schuller, Bjorn W. W.
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1472 - 1487
  • [50] Multimodal Approach of Speech Emotion Recognition Using Multi-Level Multi-Head Fusion Attention-Based Recurrent Neural Network
    Ngoc-Huynh Ho
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Lee, Gueesang
    IEEE ACCESS, 2020, 8 : 61672 - 61686