Prompt-matching synthesis model for missing modalities in sentiment analysis

被引：0

作者：

Liu, Jiaqi ^{[1
]}

Wang, Yong ^{[1
]}

Yang, Jing ^{[1
,2
]}

Shang, Fanshu ^{[1
]}

He, Fan ^{[1
]}

机构：

[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China

[2] Yantai Inst Sci & Technol, Sch Data Intelligence, Yantai 265600, Shandong, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 318卷

基金：

中国博士后科学基金;

关键词：

Multimodal sentiment analysis; Uncertain missing modalities; Prompt learning; Bidirectional cross-modal matching; Synthesized modal features;

D O I：

10.1016/j.knosys.2025.113519

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In multimodal sentiment analysis, commentary videos often lack certain sentences or frames, leaving gaps that may contain crucial sentiment cues. Current methods primarily focus on modal fusion, overlooking the uncertainty of missing modalities, which results in underutilized data and less complete and less accurate sentiment analysis. To address these challenges, we propose a prompt-matching synthesis model to handle missing modalities in sentiment analysis. First, we develop unimodal encoders using prompt learning to enhance the model's understanding of inter-modal relationships during feature extraction. Learnable prompts are introduced before textual modalities, while cross-modal prompts are applied to acoustic and visual modalities. Second, we implement bidirectional cross-modal matching to minimize discrepancies among shared features, employing central moment discrepancy loss across multiple modalities. A comparator is designed to infer features based on the absence of one or two modalities, allowing for the synthesis of missing modality features from available data. Finally, the synthesized modal features are integrated with the initial features, optimizing the fusion loss and central moment discrepancy loss to enhance sentiment analysis accuracy. Experimental results demonstrate that our method achieves strong performance on multiple datasets for multimodal sentiment analysis, even with uncertain missing modalities.

引用

页数：16

共 82 条

[1] Baldi Pierre, 2012, P ICML WORKSH UNS TR, P37
[2] Brown TB, 2020, ADV NEUR IN, V33
[3] Hybrid representation learning for cross-modal retrieval
Cao, Wenming
Lin, Qiubin
He, Zhihai
He, Zhiquan
[J]. NEUROCOMPUTING, 2019, 345 : 45 - 57
[4] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[5] Diao HW, 2021, AAAI CONF ARTIF INTE, V35, P1218
[6] FINDING STRUCTURE IN TIME
ELMAN, JL
[J]. COGNITIVE SCIENCE, 1990, 14 (02) : 179 - 211
[7] Finding beans in burgers: Deep semantic-visual embedding with localization
Engilberge, Martin
Chevallier, Louis
Perez, Patrick
Cord, Matthieu
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3984 - 3993
[8] EmoSen: Generating Sentiment and Emotion Controlled Responses in a Multimodal Dialogue System
Firdaus, Mauajama
Chauhan, Hardik
Ekbal, Asif
Bhattacharyya, Pushpak
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1555 - 1566
[9] Robust Audiovisual Emotion Recognition: Aligning Modalities, Capturing Temporal Information, and Handling Missing Features
Goncalves, Lucas
Busso, Carlos
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 2156 - 2170
[10] Gu YX, 2022, Arxiv, DOI arXiv:2109.04332

← 1 2 3 4 5 6 7 8 9 →