Attention-optimized vision-enhanced prompt learning for few-shot multi-modal sentiment analysis

被引：0

作者：

Zhou, Zikai ^{[1
]}

Qiao, Baiyou ^{[1
]}

Feng, Haisong ^{[2
]}

Han, Donghong ^{[1
]}

Wu, Gang ^{[1
]}

机构：

[1] School of Computer Science and Engineering, Northeastern University, Shenyang

[2] School of Informatics, Xiamen University, Xiamen

来源：

Neural Computing and Applications | 2024年 / 36卷 / 33期

基金：

中国国家自然科学基金;

关键词：

Few-shot learning; GCN; Multi-modal sentiment analysis; Prompt learning;

D O I：

10.1007/s00521-024-10297-w

中图分类号：

学科分类号：

摘要：

To fulfill the explosion of multi-modal data, multi-modal sentiment analysis (MSA) emerged and attracted widespread attention. Unfortunately, conventional multi-modal research relies on large-scale datasets. On the one hand, collecting and annotating large-scale datasets is challenging and resource-intensive. On the other hand, the training on large-scale datasets also increases the research cost. However, the few-shot MSA (FMSA), which is proposed recently, requires only few samples for training. Therefore, in comparison, it is more practical and realistic. There have been approaches to investigating the prompt-based method in the field of FMSA, but they have not sufficiently considered or leveraged the information specificity of visual modality. Thus, we propose a vision-enhanced prompt-based model based on graph structure to better utilize vision information for fusion and collaboration in encoding and optimizing prompt representations. Specifically, we first design an aggregation-based multi-modal attention module. Then, based on this module and the biaffine attention, we construct a syntax–semantic dual-channel graph convolutional network to optimize the encoding of learnable prompts by understanding the vision-enhanced information in semantic and syntactic knowledge. Finally, we propose a collaboration-based optimization module based on the collaborative attention mechanism, which employs visual information to collaboratively optimize prompt representations. Extensive experiments conducted on both coarse-grained and fine-grained MSA datasets have demonstrated that our model significantly outperforms the baseline models. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

引用

页码：21091 / 21105

页数：14

共 50 条

[41] MHA-WoML: Multi-head attention and Wasserstein-OT for few-shot learning
Yang, Junyan
Jiang, Jie
Guo, Yanming
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 681 - 694
[42] Few-shot based learning recaptured image detection with multi-scale feature fusion and attention☆
Hussain, Israr
Tan, Shunquan
Huang, Jiwu
PATTERN RECOGNITION, 2025, 161
[43] MuLAN: Multi-level attention-enhanced matching network for few-shot knowledge graph completion
Li, Qianyu
Feng, Bozheng
Tang, Xiaoli
Yu, Han
Song, Hengjie
NEURAL NETWORKS, 2024, 174
[44] MPE3: Learning meta-prompt with entity-enhanced semantics for few-shot named entity recognition
Xia, Yuwei
Tong, Zhao
Wang, Liang
Liu, Qiang
Wu, Shu
Zhang, Xiaoyu
NEUROCOMPUTING, 2025, 620
[45] Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model
Zhang, Yazhou
Rong, Lu
Li, Xiang
Chen, Rui
ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 518 - 532
[46] Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism
Li, Hongchan
Lu, Yantong
Zhu, Haodong
ELECTRONICS, 2024, 13 (11)
[47] Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis
Zhang, Yazhou
Yu, Yang
Wang, Mengyao
Huang, Min
Hossain, M. Shamim
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)
[48] EMNet: A Novel Few-Shot Image Classification Model with Enhanced Self-Correlation Attention and Multi-Branch Joint Module
Li, Fufang
Zhang, Weixiang
Shang, Yi
BIOMIMETICS, 2025, 10 (01)
[49] Structural Attention Enhanced Continual Meta-Learning for Graph Edge Labeling Based Few-Shot Remote Sensing Scene Classification
Li, Feimo
Li, Shuaibo
Fan, Xinxin
Li, Xiong
Chang, Hongxing
REMOTE SENSING, 2022, 14 (03)
[50] Exploring the potential of using ChatGPT for rhetorical move-step analysis: The impact of prompt refinement, few-shot learning, and fine-tuning
Kim, Minjin
Lu, Xiaofei
JOURNAL OF ENGLISH FOR ACADEMIC PURPOSES, 2024, 71

← 1 2 3 4 5 →