Robust Multimodal Representation under Uncertain Missing Modalities

被引:0
|
作者
Lan, Guilin [1 ]
Du, Yeqian [1 ]
Yang, Zhouwang [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
基金
国家重点研发计划;
关键词
Multimodal representation; Missing modalities; Multimodal sentiment analysis; Multimedia;
D O I
10.1145/3702003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal representation learning has gained significant attention across various fields, yet it faces challenges when dealing with missing modalities in real-world applications. Existing solutions are confined to specific scenarios, such as single-modality missing or missing modalities in test cases, thereby restricting their applicability. To address a more general scenario of uncertain missing modalities in both training and testing framework projects each modality's representation into a shared subspace, enabling the reconstruction of any missing modalities within a unified model. We propose an interaction refinement module that utilizes cross-modal attention to enhance these reconstructions, particularly beneficial in scenarios with limited complete modality data. Furthermore, we introduce an iterative training strategy that alternately trains different modules to effectively utilize both complete and incomplete modality data. Experimental results on four benchmark datasets demonstrate the superiority of RMRU over existing baselines, particularly in scenarios with a high rate of missing modalities. Remarkably, our proposed RMRU can be broadly applied to diverse scenarios, regardless of modality types and quantities.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Prompt-matching synthesis model for missing modalities in sentiment analysis
    Liu, Jiaqi
    Wang, Yong
    Yang, Jing
    Shang, Fanshu
    He, Fan
    KNOWLEDGE-BASED SYSTEMS, 2025, 318
  • [22] The Effects of Unimodal Representation Choices on Multimodal Learning
    Ito, Fernando Tadao
    Caseli, Helena de Medeiros
    Moreira, Jander
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2119 - 2126
  • [23] A deep semantic framework for multimodal representation learning
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 9255 - 9276
  • [24] Daily Precipitation Prediction Based on Multimodal Representation
    Wu, Guohan
    Chen, Chi-Hua
    2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE, 2023,
  • [25] A deep semantic framework for multimodal representation learning
    Cheng Wang
    Haojin Yang
    Christoph Meinel
    Multimedia Tools and Applications, 2016, 75 : 9255 - 9276
  • [26] Multimodal pretraining for unsupervised protein representation learning
    Nguyen, Viet Thanh Duy
    Hy, Truong Son
    BIOLOGY METHODS & PROTOCOLS, 2024, 9 (01)
  • [27] Multimodal Representation Learning for Recommendation in Internet of Things
    Huang, Zhenhua
    Xu, Xin
    Ni, Juan
    Zhu, Honghao
    Wang, Cheng
    IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (06) : 10675 - 10685
  • [28] Multimodal sentiment analysis based on multi-stage graph fusion networks under random missing modality conditions
    Zhang, Ting
    Song, Bin
    Zhang, Zhiyong
    Zhang, Yajuan
    IET IMAGE PROCESSING, 2025, 19 (01)
  • [29] Bridging Multimedia Modalities: Enhanced Multimodal AI Understanding and Intelligent Agents
    Gautam, Sushant
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, 2023, : 695 - 699
  • [30] AOBERT: All-modalities-in-One BERT for multimodal sentiment analysis
    Kim, Kyeonghun
    Park, Sanghyun
    INFORMATION FUSION, 2023, 92 : 37 - 45