Transformer-based short-term memory attention for enhanced multimodal sentiment analysis

被引:1
作者
Shao, Dangguo [1 ,2 ]
Tang, Kaiqiang [1 ]
Li, Jingtao [1 ]
Yi, Sanli [1 ]
Ma, Lei [1 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650500, Peoples R China
基金
中国国家自然科学基金;
关键词
Sentiment analysis; Memory attention; Multimodal fusion; Self-distillation; Modal interactions;
D O I
10.1007/s00371-025-03883-z
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In multimodal sentiment analysis, effectively utilizing and fusing information from multiple modalities remains a challenging task. Most existing studies focus on single-modal information, neglecting the potential of multimodal data. To address this, we propose a Transformer-based short-term memory attention (S-MA) model that captures both intra- and inter-modal interactions, learns the weight distribution between different modalities, and enhances modality representations. The model introduces a short-term memory attention module to retain significant features obtained from the previous training session, employing Transformer structures for both intra-modal and inter-modal interactions. Additionally, we introduce a self-distillation method that uses early-stage model outputs as soft labels to guide subsequent training, optimizing the model's representational capabilities. Experimental results on three public datasets demonstrate that the S-MA model outperforms previous state-of-the-art baselines, particularly excelling on the MVSA-Single and HFM datasets, with improvements of 1.98, 1.43 and 1.67, 1.75 percentage points in accuracy (ACC) and F1 metrics, respectively. The source code and datasets are available at [https://github.com/Doyken/S-MA].
引用
收藏
页码:8537 / 8552
页数:16
相关论文
共 46 条
[31]  
Vaswani A., 2017, Advances in Neural Information Processing Systems, DOI DOI 10.48550/ARXIV.1706.03762
[32]   Multimodal sentiment analysis based on cross-instance graph neural networks [J].
Wang, Hongbin ;
Ren, Chun ;
Yu, Zhengtao .
APPLIED INTELLIGENCE, 2024, 54 (04) :3403-3416
[33]  
Wei YW, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, P5240
[34]  
Xu N, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P3777
[35]   A Co-Memory Network for Multimodal Sentiment Analysis [J].
Xu, Nan ;
Mao, Wenji ;
Chen, Guandan .
ACM/SIGIR PROCEEDINGS 2018, 2018, :929-932
[36]   MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis [J].
Xu, Nan ;
Mao, Wenji .
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, :2399-2402
[37]  
Yang P, 2018, ARXIV
[38]   Image-Text Multimodal Emotion Classification via Multi-View Attentional Network [J].
Yang, Xiaocui ;
Feng, Shi ;
Wang, Daling ;
Zhang, Yifei .
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 :4014-4026
[39]  
Yang XC, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), P328
[40]   Multi-Modal Sentiment Classification With Independent and Interactive Knowledge via Semi-Supervised Learning [J].
Zhang, Dong ;
Li, Shoushan ;
Zhu, Qiaoming ;
Zhou, Guodong .
IEEE ACCESS, 2020, 8 :22945-22954