Robust Multimodal Representation under Uncertain Missing Modalities

被引:0
|
作者
Lan, Guilin [1 ]
Du, Yeqian [1 ]
Yang, Zhouwang [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
基金
国家重点研发计划;
关键词
Multimodal representation; Missing modalities; Multimodal sentiment analysis; Multimedia;
D O I
10.1145/3702003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal representation learning has gained significant attention across various fields, yet it faces challenges when dealing with missing modalities in real-world applications. Existing solutions are confined to specific scenarios, such as single-modality missing or missing modalities in test cases, thereby restricting their applicability. To address a more general scenario of uncertain missing modalities in both training and testing framework projects each modality's representation into a shared subspace, enabling the reconstruction of any missing modalities within a unified model. We propose an interaction refinement module that utilizes cross-modal attention to enhance these reconstructions, particularly beneficial in scenarios with limited complete modality data. Furthermore, we introduce an iterative training strategy that alternately trains different modules to effectively utilize both complete and incomplete modality data. Experimental results on four benchmark datasets demonstrate the superiority of RMRU over existing baselines, particularly in scenarios with a high rate of missing modalities. Remarkably, our proposed RMRU can be broadly applied to diverse scenarios, regardless of modality types and quantities.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Adaptive Fusion and Edge-Oriented Enhancement for Brain Tumor Segmentation With Missing Modalities
    Yan, Yulan
    Zhan, Yinwei
    He, Huiyao
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2025, 35 (01)
  • [42] Generative learning-based lightweight MRI brain tumor segmentation with missing modalities
    Zhang, Xinliang
    Chen, Qian
    He, Hangzhou
    Zhu, Lei
    Xie, Zhaoheng
    Lu, Yanye
    Cheng, Fangxiao
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 261
  • [43] ACN: Adversarial Co-training Network for Brain Tumor Segmentation with Missing Modalities
    Wang, Yixin
    Zhang, Yang
    Liu, Yang
    Lin, Zihao
    Tian, Jiang
    Zhong, Cheng
    Shi, Zhongchao
    Fan, Jianping
    He, Zhiqiang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VII, 2021, 12907 : 410 - 420
  • [44] Efficient Multimodal Transformer With Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis
    Sun, Licai
    Lian, Zheng
    Liu, Bin
    Tao, Jianhua
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (01) : 309 - 325
  • [45] SCANET: Improving multimodal representation and fusion with sparse- and cross-attention for multimodal sentiment analysis
    Wang, Hao
    Yang, Mingchuan
    Li, Zheng
    Liu, Zhenhua
    Hu, Jie
    Fu, Ziwang
    Liu, Feng
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [46] The multimodal representation of emotion in film: Integrating cognitive and semiotic approaches
    Feng, Dezheng
    O'Halloran, Kay L.
    SEMIOTICA, 2013, 197 : 79 - 100
  • [47] DRLN: Disentangled Representation Learning Network for Multimodal Sentiment Analysis
    Hou, Jingming
    Omar, Nazlia
    Tiun, Sabrina
    Saad, Saidah
    He, Qian
    NEURAL COMPUTING FOR ADVANCED APPLICATIONS, NCAA 2024, PT III, 2025, 2183 : 148 - 161
  • [48] Consumption as a Rhythm: A Multimodal Experiment on the Representation of Time-Series
    Macas, Catarina
    Martins, Pedro
    Machado, Penousal
    2018 22ND INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV), 2018, : 504 - 509
  • [49] Learning speaker-independent multimodal representation for sentiment analysis
    Wang, Jianwen
    Wang, Shiping
    Lin, Mingwei
    Xu, Zeshui
    Guo, Wenzhong
    INFORMATION SCIENCES, 2023, 628 : 208 - 225
  • [50] Improving multimodal action representation with joint motion history context
    Malawski, Filip
    Kwolek, Bogdan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 61 : 198 - 208