Transformer-based correlation mining network with self-supervised label generation for multimodal sentiment analysis

被引：1

作者：

Wang, Ruiqing ^{[1
]}

Yang, Qimeng ^{[1
]}

Tian, Shengwei ^{[1
]}

Yu, Long ^{[2
]}

He, Xiaoyu ^{[3
]}

Wang, Bo ^{[1
]}

机构：

[1] Xinjiang Univ, Sch Software, Urumqi, Xinjiang, Peoples R China

[2] Xinjiang Univ, Network & Informat Ctr, Network, Xinjiang, Peoples R China

[3] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830000, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 618卷

基金：

中国国家自然科学基金;

关键词：

Multimodal sentiment analysis; Transformer; Multimodal fusion; Collaborative learning; FUSION;

D O I：

10.1016/j.neucom.2024.129163

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal Sentiment Analysis (MSA) aims to recognize and understand a speaker's sentiment state by integrating information from natural language, facial expressions, and voice, has gained much attention in recent years. However, modeling multimodal data poses two main challenges: 1) There are potential sentiment correlations between modalities and within contextual contexts, making it difficult to perform deep-level sentiment correlation mining and information fusion; 2) Sentiment information tends to be unevenly distributed across different modalities, posing challenges in fully leveraging information from each modality for collaborative learning. To address the above challenges, we propose CMLG based on correlation mining and label generation. This approach utilizes a Squeeze and Excitation Network (SEN) to recalibrate modality features and employs Transformer-based intra-modal and inter-modal feature extractors to mine the intrinsic connections between different modalities. In addition, we designed a Self-Supervised Label Generation Module (SLGM) that relies on the positive correlation between feature distances and label offsets to generate single-peak labels, and jointly train multi-peak and single-peak tasks to detect sentiment differences. Extensive experiments on three benchmark dataset (MOSI, MOSEI and SIMS) have shown that the above proposed method CMLG achieves excellent results.

引用

页数：9

共 50 条

[31] ATTENTION-GUIDED CONTRASTIVE MASKED IMAGE MODELING FOR TRANSFORMER-BASED SELF-SUPERVISED LEARNING
Zhan, Yucheng
Zhao, Yucheng
Luo, Chong
Zhang, Yueyi
Sun, Xiaoyan
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2490 - 2494
[32] A superior image inpainting scheme using Transformer-based self-supervised attention GAN model
Zhou, Meili
Liu, Xiangzhen
Yi, Tingting
Bai, Zongwen
Zhang, Pei
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 233
[33] Self-supervised Transformer-Based Pre-training Method with General Plant Infection Dataset
Wang, Zhengle
Wang, Ruifeng
Wang, Minjuan
Lai, Tianyun
Zhang, Man
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT II, 2025, 15032 : 189 - 202
[34] A Novel Transformer-Based Self-Supervised Learning Method to Enhance Photoplethysmogram Signal Artifact Detection
Le, Thanh-Dung
Macabiau, Clara
Albert, Kevin
Jouvet, Philippe
Noumeir, Rita
IEEE ACCESS, 2024, 12 : 159860 - 159874
[35] Self-supervised representation learning using multimodal Transformer for emotion recognition
Goetz, Theresa
Arora, Pulkit
Erick, F. X.
Holzer, Nina
Sawant, Shrutika
PROCEEDINGS OF THE 8TH INTERNATIONAL WORKSHOP ON SENSOR-BASED ACTIVITY RECOGNITION AND ARTIFICIAL INTELLIGENCE, IWOAR 2023, 2023,
[36] Interpretability in Sentiment Analysis: A Self-Supervised Approach to Sentiment Cue Extraction
Sun, Yawei
He, Saike
Han, Xu
Luo, Yan
APPLIED SCIENCES-BASEL, 2024, 14 (07):
[37] Multimodal sentiment analysis based on improved correlation representation network
Yaermaimaiti, Yilihamu
Yan, Tianxing
Zhuang, Guohang
Kari, Tusongjiang
INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2024, 30 (06) : 679 - 698
[38] Self-HCL: Self-Supervised Multitask Learning with Hybrid Contrastive Learning Strategy for Multimodal Sentiment Analysis
Fu, Youjia
Fu, Junsong
Xue, Huixia
Xu, Zihao
ELECTRONICS, 2024, 13 (14)
[39] A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning
Kotei, Evans
Thirunavukarasu, Ramkumar
INFORMATION, 2023, 14 (03)
[40] Liveness Detection in Computer Vision: Transformer-Based Self-Supervised Learning for Face Anti-Spoofing
Keresh, Arman
Shamoi, Pakizar
IEEE ACCESS, 2024, 12 : 185673 - 185685

← 1 2 3 4 5 →