Robust Multimodal Representation under Uncertain Missing Modalities

被引：0

作者：

Lan, Guilin ^{[1
]}

Du, Yeqian ^{[1
]}

Yang, Zhouwang ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2025年 / 21卷 / 01期

基金：

国家重点研发计划;

关键词：

Multimodal representation; Missing modalities; Multimodal sentiment analysis; Multimedia;

D O I：

10.1145/3702003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multimodal representation learning has gained significant attention across various fields, yet it faces challenges when dealing with missing modalities in real-world applications. Existing solutions are confined to specific scenarios, such as single-modality missing or missing modalities in test cases, thereby restricting their applicability. To address a more general scenario of uncertain missing modalities in both training and testing framework projects each modality's representation into a shared subspace, enabling the reconstruction of any missing modalities within a unified model. We propose an interaction refinement module that utilizes cross-modal attention to enhance these reconstructions, particularly beneficial in scenarios with limited complete modality data. Furthermore, we introduce an iterative training strategy that alternately trains different modules to effectively utilize both complete and incomplete modality data. Experimental results on four benchmark datasets demonstrate the superiority of RMRU over existing baselines, particularly in scenarios with a high rate of missing modalities. Remarkably, our proposed RMRU can be broadly applied to diverse scenarios, regardless of modality types and quantities.

引用

页数：23

共 50 条

[21] Prompt-matching synthesis model for missing modalities in sentiment analysis
Liu, Jiaqi
Wang, Yong
Yang, Jing
Shang, Fanshu
He, Fan
KNOWLEDGE-BASED SYSTEMS, 2025, 318
[22] The Effects of Unimodal Representation Choices on Multimodal Learning
Ito, Fernando Tadao
Caseli, Helena de Medeiros
Moreira, Jander
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2119 - 2126
[23] A deep semantic framework for multimodal representation learning
Wang, Cheng
Yang, Haojin
Meinel, Christoph
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 9255 - 9276
[24] Daily Precipitation Prediction Based on Multimodal Representation
Wu, Guohan
Chen, Chi-Hua
2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE, 2023,
[25] A deep semantic framework for multimodal representation learning
Cheng Wang
Haojin Yang
Christoph Meinel
Multimedia Tools and Applications, 2016, 75 : 9255 - 9276
[26] Multimodal pretraining for unsupervised protein representation learning
Nguyen, Viet Thanh Duy
Hy, Truong Son
BIOLOGY METHODS & PROTOCOLS, 2024, 9 (01)
[27] Multimodal Representation Learning for Recommendation in Internet of Things
Huang, Zhenhua
Xu, Xin
Ni, Juan
Zhu, Honghao
Wang, Cheng
IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (06) : 10675 - 10685
[28] Multimodal sentiment analysis based on multi-stage graph fusion networks under random missing modality conditions
Zhang, Ting
Song, Bin
Zhang, Zhiyong
Zhang, Yajuan
IET IMAGE PROCESSING, 2025, 19 (01)
[29] Bridging Multimedia Modalities: Enhanced Multimodal AI Understanding and Intelligent Agents
Gautam, Sushant
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 695 - 699
[30] AOBERT: All-modalities-in-One BERT for multimodal sentiment analysis
Kim, Kyeonghun
Park, Sanghyun
INFORMATION FUSION, 2023, 92 : 37 - 45

← 1 2 3 4 5 →