Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers

被引:0
|
作者
Zaidi, Syed Aun Muhammad [1 ]
Latif, Siddique [2 ]
Qadir, Junaid [3 ]
机构
[1] Informat Technol Univ ITU, Lahore 54700, Pakistan
[2] Queensland Univ Technol QUT, Brisbane, Qld 4000, Australia
[3] Qatar Univ, Coll Engn, Comp Sci & Engn Dept, Doha, Qatar
来源
IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY | 2024年 / 5卷
关键词
Co-attention networks; graph attention networks; multi-modal learning; multimodal emotion recognition; SPEECH;
D O I
10.1109/OJCS.2024.3486904
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Despite the recent progress in emotion recognition, state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this article we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language multimodal emotion recognition. Our model utilises pre-trained models for multimodal feature extraction and is equipped with dual attention mechanisms including graph attention and co-attention to capture complex dependencies across different modalities and languages to achieve improved cross-language multimodal emotion recognition. In addition, our model also exploits a transformer encoder layer for high-level feature representation to improve emotion classification accuracy. This novel construct preserves modality-specific emotional information while enhancing cross-modality and cross-language feature generalisation, resulting in improved performance with minimal target language data. We assess our model's performance on four publicly available emotion recognition datasets and establish its superior effectiveness compared to recent approaches and baseline models.
引用
收藏
页码:684 / 693
页数:10
相关论文
共 41 条
  • [1] Cross-Language Acoustic Emotion Recognition: An Overview and Some Tendencies
    Feraru, Silvia Monica
    Schuller, Dagmar
    Schuller, Bjoern
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 125 - 131
  • [2] Enhancing Robustness Against Adversarial Attacks in Multimodal Emotion Recognition With Spiking Transformers
    Chen, Guoming
    Qian, Zhuoxian
    Zhang, Dong
    Qiu, Shuang
    Zhou, Ruqi
    IEEE ACCESS, 2025, 13 : 34584 - 34597
  • [3] Coordination Attention based Transformers with bidirectional contrastive loss for multimodal speech emotion recognition
    Fan, Weiquan
    Xu, Xiangmin
    Zhou, Guohua
    Deng, Xiaofang
    Xing, Xiaofen
    SPEECH COMMUNICATION, 2025, 169
  • [4] Using transformers for multimodal emotion recognition: Taxonomies and state of the art review
    Hazmoune, Samira
    Bougamouza, Fateh
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [5] Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text
    Lee, Yoonhyung
    Yoon, Seunghyun
    Jung, Kyomin
    INTERSPEECH 2020, 2020, : 2717 - 2721
  • [6] Multimodal Cross-Attention Bayesian Network for Social News Emotion Recognition
    Wang, Xinzhi
    Li, Mengyue
    Chang, Yudong
    Luo, Xiangfeng
    Yao, Yige
    Li, Zhichao
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [7] MULTIMODAL ATTENTION-MECHANISM FOR TEMPORAL EMOTION RECOGNITION
    Ghaleb, Esam
    Niehues, Jan
    Asteriadis, Stylianos
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 251 - 255
  • [8] Emotion Recognition with Pre-Trained Transformers Using Multimodal Signals
    Vazquez-Rodriguez, Juan
    Lefebvre, Gregoire
    Cumin, Julien
    Crowley, James L.
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,
  • [9] Multimodal emotion recognition based on audio and text by using hybrid attention networks
    Zhang, Shiqing
    Yang, Yijiao
    Chen, Chen
    Liu, Ruixin
    Tao, Xin
    Guo, Wenping
    Xu, Yicheng
    Zhao, Xiaoming
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
  • [10] MULTIMODAL TRANSFORMER WITH LEARNABLE FRONTEND AND SELF ATTENTION FOR EMOTION RECOGNITION
    Dutta, Soumya
    Ganapathy, Sriram
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6917 - 6921