Attention-optimized vision-enhanced prompt learning for few-shot multi-modal sentiment analysis

被引:0
|
作者
Zhou, Zikai [1 ]
Qiao, Baiyou [1 ]
Feng, Haisong [2 ]
Han, Donghong [1 ]
Wu, Gang [1 ]
机构
[1] School of Computer Science and Engineering, Northeastern University, Shenyang
[2] School of Informatics, Xiamen University, Xiamen
基金
中国国家自然科学基金;
关键词
Few-shot learning; GCN; Multi-modal sentiment analysis; Prompt learning;
D O I
10.1007/s00521-024-10297-w
中图分类号
学科分类号
摘要
To fulfill the explosion of multi-modal data, multi-modal sentiment analysis (MSA) emerged and attracted widespread attention. Unfortunately, conventional multi-modal research relies on large-scale datasets. On the one hand, collecting and annotating large-scale datasets is challenging and resource-intensive. On the other hand, the training on large-scale datasets also increases the research cost. However, the few-shot MSA (FMSA), which is proposed recently, requires only few samples for training. Therefore, in comparison, it is more practical and realistic. There have been approaches to investigating the prompt-based method in the field of FMSA, but they have not sufficiently considered or leveraged the information specificity of visual modality. Thus, we propose a vision-enhanced prompt-based model based on graph structure to better utilize vision information for fusion and collaboration in encoding and optimizing prompt representations. Specifically, we first design an aggregation-based multi-modal attention module. Then, based on this module and the biaffine attention, we construct a syntax–semantic dual-channel graph convolutional network to optimize the encoding of learnable prompts by understanding the vision-enhanced information in semantic and syntactic knowledge. Finally, we propose a collaboration-based optimization module based on the collaborative attention mechanism, which employs visual information to collaboratively optimize prompt representations. Extensive experiments conducted on both coarse-grained and fine-grained MSA datasets have demonstrated that our model significantly outperforms the baseline models. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.
引用
收藏
页码:21091 / 21105
页数:14
相关论文
共 50 条
  • [41] MHA-WoML: Multi-head attention and Wasserstein-OT for few-shot learning
    Yang, Junyan
    Jiang, Jie
    Guo, Yanming
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 681 - 694
  • [42] Few-shot based learning recaptured image detection with multi-scale feature fusion and attention☆
    Hussain, Israr
    Tan, Shunquan
    Huang, Jiwu
    PATTERN RECOGNITION, 2025, 161
  • [43] MuLAN: Multi-level attention-enhanced matching network for few-shot knowledge graph completion
    Li, Qianyu
    Feng, Bozheng
    Tang, Xiaoli
    Yu, Han
    Song, Hengjie
    NEURAL NETWORKS, 2024, 174
  • [44] MPE3: Learning meta-prompt with entity-enhanced semantics for few-shot named entity recognition
    Xia, Yuwei
    Tong, Zhao
    Wang, Liang
    Liu, Qiang
    Wu, Shu
    Zhang, Xiaoyu
    NEUROCOMPUTING, 2025, 620
  • [45] Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model
    Zhang, Yazhou
    Rong, Lu
    Li, Xiang
    Chen, Rui
    ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 518 - 532
  • [46] Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism
    Li, Hongchan
    Lu, Yantong
    Zhu, Haodong
    ELECTRONICS, 2024, 13 (11)
  • [47] Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis
    Zhang, Yazhou
    Yu, Yang
    Wang, Mengyao
    Huang, Min
    Hossain, M. Shamim
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)
  • [48] EMNet: A Novel Few-Shot Image Classification Model with Enhanced Self-Correlation Attention and Multi-Branch Joint Module
    Li, Fufang
    Zhang, Weixiang
    Shang, Yi
    BIOMIMETICS, 2025, 10 (01)
  • [49] Structural Attention Enhanced Continual Meta-Learning for Graph Edge Labeling Based Few-Shot Remote Sensing Scene Classification
    Li, Feimo
    Li, Shuaibo
    Fan, Xinxin
    Li, Xiong
    Chang, Hongxing
    REMOTE SENSING, 2022, 14 (03)
  • [50] Exploring the potential of using ChatGPT for rhetorical move-step analysis: The impact of prompt refinement, few-shot learning, and fine-tuning
    Kim, Minjin
    Lu, Xiaofei
    JOURNAL OF ENGLISH FOR ACADEMIC PURPOSES, 2024, 71