Enhancing Cross-Modal Alignment in Multimodal Sentiment Analysis via Prompt Learning

被引:0
作者
Wang, Xiaofan [1 ]
Li, Xiuhong [1 ]
Li, Zhe [2 ,3 ]
Zhou, Chenyu [1 ]
Chen, Fan [1 ]
Yang, Dan [1 ]
机构
[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi, Peoples R China
[2] Hong Kong Polytech Univ, Dept Elect & Elect Engn, Hong Kong, Peoples R China
[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
来源
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024 | 2025年 / 15035卷
关键词
Prompt learning; Multimodal Sentiment Analysis; Alignment;
D O I
10.1007/978-981-97-8620-6_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal sentiment analysis (MSA) aims to predict the sentiment expressed in paired images and texts. Cross-modal feature alignment is crucial for models to understand the context and extract complementary semantic features. However, most previous MSA tasks have shown deficiencies in aligning features across different modalities. Experimental evidence shows that prompt learning can effectively align features, and previous studies have applied prompt learning to MSA tasks, but only in an unimodal context. Applying prompt learning to multimodal feature alignment remains a challenge. This paper employs a multimodal sentiment analysis model based on alignment prompts (MSAPL). Our model generates text and image alignment prompts via the Kronecker Product, enhancing visual modality engagement and the correlation between graphical and textual data, thus enabling a better understanding of multimodal data. Simultaneously, it employs a multi-layer, stepwise learning approach to acquire textual and image features, progressively modeling stage-feature relationships for rich contextual learning. Our experiments on three public datasets demonstrate that our model consistently outperforms all baseline models.
引用
收藏
页码:541 / 554
页数:14
相关论文
共 29 条
  • [1] Cai YT, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2506
  • [2] Chen L., 2022, P 2021 4 INT C ALG C
  • [3] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [4] Dosovitskiy A., 2021, ARXIV, P1, DOI 10.48550/ARXIV.2010.11929
  • [5] Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features
    Guo, Qinglang
    Liao, Yong
    Li, Zhe
    Liang, Shenglin
    [J]. ENTROPY, 2023, 25 (10)
  • [6] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [7] Hu G., 2022, P 2022 C EMPIRICAL M, P7837, DOI DOI 10.18653/V1/2022.EMNLP-MAIN.534
  • [8] Image-text sentiment analysis via deep multimodal attentive fusion
    Huang, Feiran
    Zhang, Xiaoming
    Zhao, Zhonghua
    Xu, Jie
    Li, Zhoujun
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 167 : 26 - 37
  • [9] Huang LZ, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3444
  • [10] Knowledge-Guided Sentiment Analysis Via Learning From Natural Language Explanations
    Ke, Zunwang
    Sheng, Jiabao
    Li, Zhe
    Silamu, Wushour
    Guo, Qinglang
    [J]. IEEE ACCESS, 2021, 9 : 3570 - 3578