MULTIMODAL TRANSFORMER WITH LEARNABLE FRONTEND AND SELF ATTENTION FOR EMOTION RECOGNITION

被引:11
作者
Dutta, Soumya
Ganapathy, Sriram
机构
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
Multi-modal emotion recognition; Transformer networks; self-attention models; learnable front-end; SENTIMENT ANALYSIS; FUSION;
D O I
10.1109/ICASSP43922.2022.9747723
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we propose a novel approach for multi-modal emotion recognition from conversations using speech and text. The audio representations are learned jointly with a learnable audio front-end (LEAF) model feeding to a CNN based classifier. The text representations are derived from pre-trained bidirectional encoder representations from transformer (BERT) along with a gated recurrent network (GRU). Both the textual and audio representations are separately processed using a bidirectional GRU network with self-attention. Further, the multi-modal information extraction is achieved using a transformer that is input with the textual and audio embeddings at the utterance level. The experiments are performed on the IEMO-CAP database, where we show that the proposed framework improves over the current state-of-the-art results under all the common test settings. This is primarily due to the improved emotion recognition performance achieved in the audio domain. Further, we also show that the model is more robust to textual errors caused by an automatic speech recognition (ASR) system.
引用
收藏
页码:6917 / 6921
页数:5
相关论文
共 50 条
  • [21] Multimodal sentiment and emotion recognition in hyperbolic space
    Arano, Keith April
    Orsenigo, Carlotta
    Soto, Mauricio
    Vercellis, Carlo
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184
  • [22] Multimodal emotion recognition based on audio and text by using hybrid attention networks
    Zhang, Shiqing
    Yang, Yijiao
    Chen, Chen
    Liu, Ruixin
    Tao, Xin
    Guo, Wenping
    Xu, Yicheng
    Zhao, Xiaoming
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
  • [23] Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
    Zaidi, Syed Aun Muhammad
    Latif, Siddique
    Qadir, Junaid
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2024, 5 : 684 - 693
  • [24] Self-supervised Multimodal Emotion Recognition Combining Temporal Attention Mechanism and Unimodal Label Automatic Generation Strategy
    Sun Q.
    Wang S.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (02): : 588 - 601
  • [25] Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
    Xie, Baijun
    Sidulova, Mariia
    Park, Chung Hyuk
    SENSORS, 2021, 21 (14)
  • [26] SS-Trans (Single-Stream Transformer for Multimodal Sentiment Analysis and Emotion Recognition): The Emotion Whisperer-A Single-Stream Transformer for Multimodal Sentiment Analysis
    Ji, Mingyu
    Wei, Ning
    Zhou, Jiawei
    Wang, Xin
    ELECTRONICS, 2024, 13 (21)
  • [27] Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition
    Liu, Pengfei
    Li, Kun
    Meng, Helen
    INTERSPEECH 2020, 2020, : 379 - 383
  • [28] Multi-Label Multimodal Emotion Recognition With Transformer-Based Fusion and Emotion-Level Representation Learning
    Le, Hoai-Duy
    Lee, Guee-Sang
    Kim, Soo-Hyung
    Kim, Seungwon
    Yang, Hyung-Jeong
    IEEE ACCESS, 2023, 11 : 14742 - 14751
  • [29] Multimodal Emotion Recognition Using Deep Generalized Canonical Correlation Analysis with an Attention Mechanism
    Lan, Yu-Ting
    Liu, Wei
    Lu, Bao-Liang
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [30] A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition
    Gong, Peizhu
    Liu, Jin
    Wu, Zhongdai
    Han, Bing
    Wang, Y. Ken
    He, Huihua
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 4203 - 4220