Prompt-matching synthesis model for missing modalities in sentiment analysis

被引：0

作者：

Liu, Jiaqi ^{[1
]}

Wang, Yong ^{[1
]}

Yang, Jing ^{[1
,2
]}

Shang, Fanshu ^{[1
]}

He, Fan ^{[1
]}

机构：

[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China

[2] Yantai Inst Sci & Technol, Sch Data Intelligence, Yantai 265600, Shandong, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 318卷

基金：

中国博士后科学基金;

关键词：

Multimodal sentiment analysis; Uncertain missing modalities; Prompt learning; Bidirectional cross-modal matching; Synthesized modal features;

D O I：

10.1016/j.knosys.2025.113519

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In multimodal sentiment analysis, commentary videos often lack certain sentences or frames, leaving gaps that may contain crucial sentiment cues. Current methods primarily focus on modal fusion, overlooking the uncertainty of missing modalities, which results in underutilized data and less complete and less accurate sentiment analysis. To address these challenges, we propose a prompt-matching synthesis model to handle missing modalities in sentiment analysis. First, we develop unimodal encoders using prompt learning to enhance the model's understanding of inter-modal relationships during feature extraction. Learnable prompts are introduced before textual modalities, while cross-modal prompts are applied to acoustic and visual modalities. Second, we implement bidirectional cross-modal matching to minimize discrepancies among shared features, employing central moment discrepancy loss across multiple modalities. A comparator is designed to infer features based on the absence of one or two modalities, allowing for the synthesis of missing modality features from available data. Finally, the synthesized modal features are integrated with the initial features, optimizing the fusion loss and central moment discrepancy loss to enhance sentiment analysis accuracy. Experimental results demonstrate that our method achieves strong performance on multiple datasets for multimodal sentiment analysis, even with uncertain missing modalities.

引用

页数：16

共 82 条

[11] From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models
Guo, Jiaxian
Li, Junnan
Li, Dongxu
Tiong, Anthony Meng Huat
Li, Boyang
Tao, Dacheng
Hoi, Steven
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10867 - 10877
[12] Guo ZR, 2024, Arxiv, DOI arXiv:2407.05374
[13] Han W, 2021, PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2021, P6, DOI 10.1145/3462244.3479919
[14] Hu Yifan, 2024, Multimedia Tools Appl., P1
[15] Huan Ruohong, 2023, IEEE Trans. Multimed.
[16] Incomplete Cross-modal Retrieval with Dual-Aligned Variational Autoencoders
Jing, Mengmeng
Li, Jingjing
Zhu, Lei
Lu, Ke
Yang, Yang
Huang, Zi
[J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3283 - 3291
[17] Gradient-based learning applied to document recognition
Lecun, Y
Bottou, L
Bengio, Y
Haffner, P
[J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
[18] Stacked Cross Attention for Image-Text Matching
Lee, Kuang-Huei
Chen, Xi
Hua, Gang
Hu, Houdong
He, Xiaodong
[J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
[19] Multimodal Prompting with Missing Modalities for Visual Recognition
Lee, Yi-Lun
Tsai, Yi-Hsuan
Chiu, Wei-Chen
Lee, Chen-Yu
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14943 - 14952
[20] Lester B, 2022, Arxiv, DOI arXiv:2208.05577

← 1 2 3 4 5 6 7 8 9 →