Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers

被引:0
|
作者
Zaidi, Syed Aun Muhammad [1 ]
Latif, Siddique [2 ]
Qadir, Junaid [3 ]
机构
[1] Informat Technol Univ ITU, Lahore 54700, Pakistan
[2] Queensland Univ Technol QUT, Brisbane, Qld 4000, Australia
[3] Qatar Univ, Coll Engn, Comp Sci & Engn Dept, Doha, Qatar
来源
IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY | 2024年 / 5卷
关键词
Co-attention networks; graph attention networks; multi-modal learning; multimodal emotion recognition; SPEECH;
D O I
10.1109/OJCS.2024.3486904
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Despite the recent progress in emotion recognition, state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this article we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language multimodal emotion recognition. Our model utilises pre-trained models for multimodal feature extraction and is equipped with dual attention mechanisms including graph attention and co-attention to capture complex dependencies across different modalities and languages to achieve improved cross-language multimodal emotion recognition. In addition, our model also exploits a transformer encoder layer for high-level feature representation to improve emotion classification accuracy. This novel construct preserves modality-specific emotional information while enhancing cross-modality and cross-language feature generalisation, resulting in improved performance with minimal target language data. We assess our model's performance on four publicly available emotion recognition datasets and establish its superior effectiveness compared to recent approaches and baseline models.
引用
收藏
页码:684 / 693
页数:10
相关论文
共 41 条
  • [31] ET-GAN: Cross-Language Emotion Transfer Based on Cycle-Consistent Generative Adversarial Networks
    Jia, Xiaoqi
    Tai, Jianwei
    Zhou, Hang
    Li, Yakai
    Zhang, Weijuan
    Du, Haichao
    Huang, Qingjia
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2038 - 2045
  • [32] Emotion recognition using cross-modal attention from EEG and facial expression
    Cui, Rongxuan
    Chen, Wanzhong
    Li, Mingyang
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [33] MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion
    Khan, Mustaqeem
    Tran, Phuong-Nam
    Pham, Nhat Truong
    El Saddik, Abdulmotaleb
    Othmani, Alice
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [34] Self-supervised Multimodal Emotion Recognition Combining Temporal Attention Mechanism and Unimodal Label Automatic Generation Strategy
    Sun Q.
    Wang S.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (02): : 588 - 601
  • [35] Multi-corpus emotion recognition method based on cross-modal gated attention fusion
    Ryumina, Elena
    Ryumin, Dmitry
    Axyonov, Alexandr
    Ivanko, Denis
    Karpov, Alexey
    PATTERN RECOGNITION LETTERS, 2025, 190 : 192 - 200
  • [36] AVaTER: Fusing Audio, Visual, and Textual Modalities Using Cross-Modal Attention for Emotion Recognition
    Das, Avishek
    Sarma, Moumita Sen
    Hoque, Mohammed Moshiul
    Siddique, Nazmul
    Dewan, M. Ali Akber
    SENSORS, 2024, 24 (18)
  • [37] Cross-Modal Guiding Neural Network for Multimodal Emotion Recognition From EEG and Eye Movement Signals
    Fu, Baole
    Chu, Wenhao
    Gu, Chunrui
    Liu, Yinhua
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 5865 - 5876
  • [38] SWRR: Feature Map Classifier Based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition
    Zhao, Ziping
    Gao, Tian
    Wang, Haishuai
    Schuller, Bjoern
    INTERSPEECH 2023, 2023, : 2433 - 2437
  • [39] A Bootstrapped Multi-View Weighted Kernel Fusion Framework for Cross-Corpus Integration of Multimodal Emotion Recognition
    Chang, Chun-Min
    Su, Bo-Hao
    Lin, Shih-Chen
    Li, Jeng-Lin
    Lee, Chi-Chun
    2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 377 - 382
  • [40] Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention
    Praveen, R. Gnana
    Cardinal, Patrick
    Granger, Eric
    IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2023, 5 (03): : 360 - 373