Attention-optimized vision-enhanced prompt learning for few-shot multi-modal sentiment analysis

被引：0

作者：

Zhou, Zikai ^{[1
]}

Qiao, Baiyou ^{[1
]}

Feng, Haisong ^{[2
]}

Han, Donghong ^{[1
]}

Wu, Gang ^{[1
]}

机构：

[1] School of Computer Science and Engineering, Northeastern University, Shenyang

[2] School of Informatics, Xiamen University, Xiamen

来源：

Neural Computing and Applications | 2024年 / 36卷 / 33期

基金：

中国国家自然科学基金;

关键词：

Few-shot learning; GCN; Multi-modal sentiment analysis; Prompt learning;

D O I：

10.1007/s00521-024-10297-w

中图分类号：

学科分类号：

摘要：

To fulfill the explosion of multi-modal data, multi-modal sentiment analysis (MSA) emerged and attracted widespread attention. Unfortunately, conventional multi-modal research relies on large-scale datasets. On the one hand, collecting and annotating large-scale datasets is challenging and resource-intensive. On the other hand, the training on large-scale datasets also increases the research cost. However, the few-shot MSA (FMSA), which is proposed recently, requires only few samples for training. Therefore, in comparison, it is more practical and realistic. There have been approaches to investigating the prompt-based method in the field of FMSA, but they have not sufficiently considered or leveraged the information specificity of visual modality. Thus, we propose a vision-enhanced prompt-based model based on graph structure to better utilize vision information for fusion and collaboration in encoding and optimizing prompt representations. Specifically, we first design an aggregation-based multi-modal attention module. Then, based on this module and the biaffine attention, we construct a syntax–semantic dual-channel graph convolutional network to optimize the encoding of learnable prompts by understanding the vision-enhanced information in semantic and syntactic knowledge. Finally, we propose a collaboration-based optimization module based on the collaborative attention mechanism, which employs visual information to collaboratively optimize prompt representations. Extensive experiments conducted on both coarse-grained and fine-grained MSA datasets have demonstrated that our model significantly outperforms the baseline models. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

引用

页码：21091 / 21105

页数：14

共 50 条

[1] Unified Multi-modal Pre-training for Few-shot Sentiment Analysis with Prompt-based Learning
Yu, Yang
Zhang, Dong
Li, Shoushan
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
[2] DISCRIMINATIVE HALLUCINATION FOR MULTI-MODAL FEW-SHOT LEARNING
Pahde, Frederik
Nabi, Moin
Klein, Tassilo
Jaehnichen, Patrick
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 156 - 160
[3] Few-shot Learning for Multi-modal Social Media Event Filtering
Nascimento, Jose
Cardenuto, Joao Phillipe
Yang, Jing
Rocha, Anderson
2022 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2022,
[4] Enhanced Prompt Learning for Few-shot Text Classification Method
Li R.
Wei Z.
Fan Y.
Ye S.
Zhang G.
Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 1 - 12
[5] Knowledge-Enhanced Prompt Learning for Few-Shot Text Classification
Liu, Jinshuo
Yang, Lu
BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (04)
[6] Supervised and Few-Shot Learning for Aspect-Based Sentiment Analysis of Instruction Prompt
Huang, Jie
Cui, Yunpeng
Liu, Juan
Liu, Ming
ELECTRONICS, 2024, 13 (10)
[7] Few-shot adaptation of multi-modal foundation models: a survey
Liu, Fan
Zhang, Tianshu
Dai, Wenwen
Zhang, Chuanyi
Cai, Wenwen
Zhou, Xiaocong
Chen, Delong
ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (10)
[8] Few-shot multi-modal registration with mono-modal knowledge transfer
Wang, Peng
Guo, Yi
Wang, Yuanyuan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
[9] Adaptive multimodal prompt-tuning model for few-shot multimodal sentiment analysis
Xiang, Yan
Zhang, Anlan
Guo, Junjun
Huang, Yuxin
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025,
[10] An Adaptive Dual-channel Multi-modal graph neural network for few-shot learning
Yang, Jieyi
Dong, Yihong
Li, Guoqing
KNOWLEDGE-BASED SYSTEMS, 2025, 310

← 1 2 3 4 5 →