Prompt-matching synthesis model for missing modalities in sentiment analysis

被引:0
作者
Liu, Jiaqi [1 ]
Wang, Yong [1 ]
Yang, Jing [1 ,2 ]
Shang, Fanshu [1 ]
He, Fan [1 ]
机构
[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China
[2] Yantai Inst Sci & Technol, Sch Data Intelligence, Yantai 265600, Shandong, Peoples R China
基金
中国博士后科学基金;
关键词
Multimodal sentiment analysis; Uncertain missing modalities; Prompt learning; Bidirectional cross-modal matching; Synthesized modal features;
D O I
10.1016/j.knosys.2025.113519
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multimodal sentiment analysis, commentary videos often lack certain sentences or frames, leaving gaps that may contain crucial sentiment cues. Current methods primarily focus on modal fusion, overlooking the uncertainty of missing modalities, which results in underutilized data and less complete and less accurate sentiment analysis. To address these challenges, we propose a prompt-matching synthesis model to handle missing modalities in sentiment analysis. First, we develop unimodal encoders using prompt learning to enhance the model's understanding of inter-modal relationships during feature extraction. Learnable prompts are introduced before textual modalities, while cross-modal prompts are applied to acoustic and visual modalities. Second, we implement bidirectional cross-modal matching to minimize discrepancies among shared features, employing central moment discrepancy loss across multiple modalities. A comparator is designed to infer features based on the absence of one or two modalities, allowing for the synthesis of missing modality features from available data. Finally, the synthesized modal features are integrated with the initial features, optimizing the fusion loss and central moment discrepancy loss to enhance sentiment analysis accuracy. Experimental results demonstrate that our method achieves strong performance on multiple datasets for multimodal sentiment analysis, even with uncertain missing modalities.
引用
收藏
页数:16
相关论文
共 82 条
  • [1] Baldi Pierre, 2012, P ICML WORKSH UNS TR, P37
  • [2] Brown TB, 2020, ADV NEUR IN, V33
  • [3] Hybrid representation learning for cross-modal retrieval
    Cao, Wenming
    Lin, Qiubin
    He, Zhihai
    He, Zhiquan
    [J]. NEUROCOMPUTING, 2019, 345 : 45 - 57
  • [4] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
  • [5] Diao HW, 2021, AAAI CONF ARTIF INTE, V35, P1218
  • [6] FINDING STRUCTURE IN TIME
    ELMAN, JL
    [J]. COGNITIVE SCIENCE, 1990, 14 (02) : 179 - 211
  • [7] Finding beans in burgers: Deep semantic-visual embedding with localization
    Engilberge, Martin
    Chevallier, Louis
    Perez, Patrick
    Cord, Matthieu
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3984 - 3993
  • [8] EmoSen: Generating Sentiment and Emotion Controlled Responses in a Multimodal Dialogue System
    Firdaus, Mauajama
    Chauhan, Hardik
    Ekbal, Asif
    Bhattacharyya, Pushpak
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1555 - 1566
  • [9] Robust Audiovisual Emotion Recognition: Aligning Modalities, Capturing Temporal Information, and Handling Missing Features
    Goncalves, Lucas
    Busso, Carlos
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 2156 - 2170
  • [10] Gu YX, 2022, Arxiv, DOI arXiv:2109.04332