Transformer-based short-term memory attention for enhanced multimodal sentiment analysis

被引：1

作者：

Shao, Dangguo ^{[1
,2
]}

Tang, Kaiqiang ^{[1
]}

Li, Jingtao ^{[1
]}

Yi, Sanli ^{[1
]}

Ma, Lei ^{[1
]}

机构：

[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China

[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650500, Peoples R China

来源：

VISUAL COMPUTER | 2025年 / 41卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Sentiment analysis; Memory attention; Multimodal fusion; Self-distillation; Modal interactions;

D O I：

10.1007/s00371-025-03883-z

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In multimodal sentiment analysis, effectively utilizing and fusing information from multiple modalities remains a challenging task. Most existing studies focus on single-modal information, neglecting the potential of multimodal data. To address this, we propose a Transformer-based short-term memory attention (S-MA) model that captures both intra- and inter-modal interactions, learns the weight distribution between different modalities, and enhances modality representations. The model introduces a short-term memory attention module to retain significant features obtained from the previous training session, employing Transformer structures for both intra-modal and inter-modal interactions. Additionally, we introduce a self-distillation method that uses early-stage model outputs as soft labels to guide subsequent training, optimizing the model's representational capabilities. Experimental results on three public datasets demonstrate that the S-MA model outperforms previous state-of-the-art baselines, particularly excelling on the MVSA-Single and HFM datasets, with improvements of 1.98, 1.43 and 1.67, 1.75 percentage points in accuracy (ACC) and F1 metrics, respectively. The source code and datasets are available at [https://github.com/Doyken/S-MA].

引用

页码：8537 / 8552

页数：16

共 46 条

[31]

Vaswani A., 2017, Advances in Neural Information Processing Systems, DOI DOI 10.48550/ARXIV.1706.03762

[32] Multimodal sentiment analysis based on cross-instance graph neural networks [J].

Wang, Hongbin ;

Ren, Chun ;

Yu, Zhengtao .

APPLIED INTELLIGENCE, 2024, 54 (04) :3403-3416

[33]

Wei YW, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, P5240

[34]

Xu N, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P3777

[35] A Co-Memory Network for Multimodal Sentiment Analysis [J].

Xu, Nan ;

Mao, Wenji ;

Chen, Guandan .

ACM/SIGIR PROCEEDINGS 2018, 2018, :929-932

[36] MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis [J].

Xu, Nan ;

Mao, Wenji .

CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, :2399-2402

[37]

Yang P, 2018, ARXIV

[38] Image-Text Multimodal Emotion Classification via Multi-View Attentional Network [J].

Yang, Xiaocui ;

Feng, Shi ;

Wang, Daling ;

Zhang, Yifei .

IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 :4014-4026

[39]

Yang XC, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), P328

[40] Multi-Modal Sentiment Classification With Independent and Interactive Knowledge via Semi-Supervised Learning [J].

Zhang, Dong ;

Li, Shoushan ;

Zhu, Qiaoming ;

Zhou, Guodong .

IEEE ACCESS, 2020, 8 :22945-22954

← 1 2 3 4 5 →