Prompt-matching synthesis model for missing modalities in sentiment analysis

被引:0
作者
Liu, Jiaqi [1 ]
Wang, Yong [1 ]
Yang, Jing [1 ,2 ]
Shang, Fanshu [1 ]
He, Fan [1 ]
机构
[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China
[2] Yantai Inst Sci & Technol, Sch Data Intelligence, Yantai 265600, Shandong, Peoples R China
基金
中国博士后科学基金;
关键词
Multimodal sentiment analysis; Uncertain missing modalities; Prompt learning; Bidirectional cross-modal matching; Synthesized modal features;
D O I
10.1016/j.knosys.2025.113519
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multimodal sentiment analysis, commentary videos often lack certain sentences or frames, leaving gaps that may contain crucial sentiment cues. Current methods primarily focus on modal fusion, overlooking the uncertainty of missing modalities, which results in underutilized data and less complete and less accurate sentiment analysis. To address these challenges, we propose a prompt-matching synthesis model to handle missing modalities in sentiment analysis. First, we develop unimodal encoders using prompt learning to enhance the model's understanding of inter-modal relationships during feature extraction. Learnable prompts are introduced before textual modalities, while cross-modal prompts are applied to acoustic and visual modalities. Second, we implement bidirectional cross-modal matching to minimize discrepancies among shared features, employing central moment discrepancy loss across multiple modalities. A comparator is designed to infer features based on the absence of one or two modalities, allowing for the synthesis of missing modality features from available data. Finally, the synthesized modal features are integrated with the initial features, optimizing the fusion loss and central moment discrepancy loss to enhance sentiment analysis accuracy. Experimental results demonstrate that our method achieves strong performance on multiple datasets for multimodal sentiment analysis, even with uncertain missing modalities.
引用
收藏
页数:16
相关论文
共 82 条
  • [11] From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models
    Guo, Jiaxian
    Li, Junnan
    Li, Dongxu
    Tiong, Anthony Meng Huat
    Li, Boyang
    Tao, Dacheng
    Hoi, Steven
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10867 - 10877
  • [12] Guo ZR, 2024, Arxiv, DOI arXiv:2407.05374
  • [13] Han W, 2021, PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2021, P6, DOI 10.1145/3462244.3479919
  • [14] Hu Yifan, 2024, Multimedia Tools Appl., P1
  • [15] Huan Ruohong, 2023, IEEE Trans. Multimed.
  • [16] Incomplete Cross-modal Retrieval with Dual-Aligned Variational Autoencoders
    Jing, Mengmeng
    Li, Jingjing
    Zhu, Lei
    Lu, Ke
    Yang, Yang
    Huang, Zi
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3283 - 3291
  • [17] Gradient-based learning applied to document recognition
    Lecun, Y
    Bottou, L
    Bengio, Y
    Haffner, P
    [J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
  • [18] Stacked Cross Attention for Image-Text Matching
    Lee, Kuang-Huei
    Chen, Xi
    Hua, Gang
    Hu, Houdong
    He, Xiaodong
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
  • [19] Multimodal Prompting with Missing Modalities for Visual Recognition
    Lee, Yi-Lun
    Tsai, Yi-Hsuan
    Chiu, Wei-Chen
    Lee, Chen-Yu
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14943 - 14952
  • [20] Lester B, 2022, Arxiv, DOI arXiv:2208.05577