Robust Multimodal Representation under Uncertain Missing Modalities

被引：0

作者：

Lan, Guilin ^{[1
]}

Du, Yeqian ^{[1
]}

Yang, Zhouwang ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2025年 / 21卷 / 01期

基金：

国家重点研发计划;

关键词：

Multimodal representation; Missing modalities; Multimodal sentiment analysis; Multimedia;

D O I：

10.1145/3702003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multimodal representation learning has gained significant attention across various fields, yet it faces challenges when dealing with missing modalities in real-world applications. Existing solutions are confined to specific scenarios, such as single-modality missing or missing modalities in test cases, thereby restricting their applicability. To address a more general scenario of uncertain missing modalities in both training and testing framework projects each modality's representation into a shared subspace, enabling the reconstruction of any missing modalities within a unified model. We propose an interaction refinement module that utilizes cross-modal attention to enhance these reconstructions, particularly beneficial in scenarios with limited complete modality data. Furthermore, we introduce an iterative training strategy that alternately trains different modules to effectively utilize both complete and incomplete modality data. Experimental results on four benchmark datasets demonstrate the superiority of RMRU over existing baselines, particularly in scenarios with a high rate of missing modalities. Remarkably, our proposed RMRU can be broadly applied to diverse scenarios, regardless of modality types and quantities.

引用

页数：23

共 50 条

[31] Multimodal Blockwise Transformer for Robust Sentiment Recognition
Lai, Zhengqin
Hong, Xiaopeng
Wang, Yabin
PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMODAL AND RESPONSIBLE AFFECTIVE COMPUTING, MRAC 2024, 2024, : 88 - 92
[32] Overcoming Missing and Incomplete Modalities with Generative Adversarial Networks for Building Footprint Segmentation
Bischke, Benjamin
Helber, Patrick
Koenig, Florian
Borth, Damian
Dengel, Andreas
2018 16TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2018,
[33] Modality-Adaptive Feature Interaction for Brain Tumor Segmentation with Missing Modalities
Zhao, Zechen
Yang, Heran
Sun, Jian
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 : 183 - 192
[34] A literature survey of MR-based brain tumor segmentation with missing modalities
Zhou, Tongxue
Ruan, Su
Hu, Haigen
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2023, 104
[35] Learning Comprehensive Multimodal Representation for Cancer Survival Prediction
Wu, Xingqi
Shi, Yi
Liu, Honglei
Li, Ao
Wang, Minghui
2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 332 - 336
[36] A Multimodal Knowledge Representation Method for Fake News Detection
Zeng, Fanhao
Yao, Jiaxin
Xu, Yijie
Liu, Yanhua
2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 360 - 364
[37] TriSAT: Trimodal Representation Learning for Multimodal Sentiment Analysis
Huan, Ruohong
Zhong, Guowei
Chen, Peng
Liang, Ronghua
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4105 - 4120
[38] Building Robust Multimodal Sentiment Recognition via a Simple yet Effective Multimodal Transformer
Zong, Daoming
Ding, Chaoyue
Li, Baoxiang
Zhou, Dinghao
Li, Jiakui
Zheng, Ken
Zhou, Qunyan
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9596 - 9600
[39] Unimodal and Multimodal Integrated Representation Learning via Improved Information Bottleneck for Multimodal Sentiment Analysis
Zhang, Tonghui
Dong, Changfei
Su, Jinsong
Zhang, Haiying
Li, Yuzheng
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 564 - 576
[40] Multimodal Reconstruct and Align Net for Missing Modality Problem in Sentiment Analysis
Luo, Wei
Xu, Mengying
Lai, Hanjiang
MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 411 - 422

← 1 2 3 4 5 →