Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers

被引：0

作者：

Zaidi, Syed Aun Muhammad ^{[1
]}

Latif, Siddique ^{[2
]}

Qadir, Junaid ^{[3
]}

机构：

[1] Informat Technol Univ ITU, Lahore 54700, Pakistan

[2] Queensland Univ Technol QUT, Brisbane, Qld 4000, Australia

[3] Qatar Univ, Coll Engn, Comp Sci & Engn Dept, Doha, Qatar

来源：

IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY | 2024年 / 5卷

关键词：

Co-attention networks; graph attention networks; multi-modal learning; multimodal emotion recognition; SPEECH;

D O I：

10.1109/OJCS.2024.3486904

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Despite the recent progress in emotion recognition, state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this article we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language multimodal emotion recognition. Our model utilises pre-trained models for multimodal feature extraction and is equipped with dual attention mechanisms including graph attention and co-attention to capture complex dependencies across different modalities and languages to achieve improved cross-language multimodal emotion recognition. In addition, our model also exploits a transformer encoder layer for high-level feature representation to improve emotion classification accuracy. This novel construct preserves modality-specific emotional information while enhancing cross-modality and cross-language feature generalisation, resulting in improved performance with minimal target language data. We assess our model's performance on four publicly available emotion recognition datasets and establish its superior effectiveness compared to recent approaches and baseline models.

引用

页码：684 / 693

页数：10

共 41 条

[1] Cross-Language Acoustic Emotion Recognition: An Overview and Some Tendencies
Feraru, Silvia Monica
Schuller, Dagmar
Schuller, Bjoern
2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 125 - 131
[2] Enhancing Robustness Against Adversarial Attacks in Multimodal Emotion Recognition With Spiking Transformers
Chen, Guoming
Qian, Zhuoxian
Zhang, Dong
Qiu, Shuang
Zhou, Ruqi
IEEE ACCESS, 2025, 13 : 34584 - 34597
[3] Coordination Attention based Transformers with bidirectional contrastive loss for multimodal speech emotion recognition
Fan, Weiquan
Xu, Xiangmin
Zhou, Guohua
Deng, Xiaofang
Xing, Xiaofen
SPEECH COMMUNICATION, 2025, 169
[4] Using transformers for multimodal emotion recognition: Taxonomies and state of the art review
Hazmoune, Samira
Bougamouza, Fateh
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[5] Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text
Lee, Yoonhyung
Yoon, Seunghyun
Jung, Kyomin
INTERSPEECH 2020, 2020, : 2717 - 2721
[6] Multimodal Cross-Attention Bayesian Network for Social News Emotion Recognition
Wang, Xinzhi
Li, Mengyue
Chang, Yudong
Luo, Xiangfeng
Yao, Yige
Li, Zhichao
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[7] MULTIMODAL ATTENTION-MECHANISM FOR TEMPORAL EMOTION RECOGNITION
Ghaleb, Esam
Niehues, Jan
Asteriadis, Stylianos
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 251 - 255
[8] Emotion Recognition with Pre-Trained Transformers Using Multimodal Signals
Vazquez-Rodriguez, Juan
Lefebvre, Gregoire
Cumin, Julien
Crowley, James L.
2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,
[9] Multimodal emotion recognition based on audio and text by using hybrid attention networks
Zhang, Shiqing
Yang, Yijiao
Chen, Chen
Liu, Ruixin
Tao, Xin
Guo, Wenping
Xu, Yicheng
Zhao, Xiaoming
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
[10] MULTIMODAL TRANSFORMER WITH LEARNABLE FRONTEND AND SELF ATTENTION FOR EMOTION RECOGNITION
Dutta, Soumya
Ganapathy, Sriram
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6917 - 6921

← 1 2 3 4 5 →