Toward Robust Multimodal Sentiment Analysis using multimodal foundational models

被引:0
|
作者
Zhao, Xianbing [1 ]
Poria, Soujanya [2 ]
Li, Xuejiao [1 ]
Chen, Yixin [1 ]
Tang, Buzhou [1 ,3 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Singapore Univ Technol & Design, Singapore, Singapore
[3] Pengcheng Natl Lab, Shenzhen, Peoples R China
关键词
Multimodal sentiment analysis; Missing modality; Semantic match; Multimodal foundational models;
D O I
10.1016/j.eswa.2025.126974
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing multimodal sentiment analysis tasks highly rely on the assumption that the training and test sets are complete multimodal data, while this assumption can be difficult to hold: the multimodal data are often incomplete in real-world scenarios. Therefore, a robust multimodal model in scenarios with randomly missing modalities is highly preferred. Recently, CLIP-based multimodal foundational models have demonstrated impressive performance on numerous multimodal tasks by learning the aligned cross-modal semantics of image and text pairs, but the multimodal foundational models are also unable to directly address scenarios involving modality absence. To alleviate this issue, we propose a simple and effective framework, namely TRML, Toward Robust Multimodal Sentiment Analysis using Multimodal Foundational Models. TRML employs generated virtual modalities to replace missing modalities and aligns the semantic spaces between the generated and missing modalities. Concretely, we design a missing modality inference module to generate virtual modalities and replace missing modalities. We also designed a semantic matching learning module to align semantic spaces generated and missing modalities. Under the prompt of complete modality, our model captures the semantics of missing modalities by leveraging the aligned cross-modal semantic space. Experiments demonstrate the superiority of our approach on three multimodal sentiment analysis benchmark datasets, CMU-MOSI, CMU-MOSEI, and MELD.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Multimodal Sentiment Analysis: Sentiment Analysis Using Audiovisual Format
    Yadav, Sumit K.
    Bhushan, Mayank
    Gupta, Swati
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1415 - 1419
  • [2] ENSEMBLE MODELS FOR MULTIMODAL SENTIMENT ANALYSIS USING TEXTUAL AND IMAGE FUSION
    Bolcas, Radu-Daniel
    Ciuc, Mihai
    Popovici, Eduard-Cristian
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2024, 86 (04): : 279 - 290
  • [3] Efficient Multimodal Transformer With Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis
    Sun, Licai
    Lian, Zheng
    Liu, Bin
    Tao, Jianhua
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (01) : 309 - 325
  • [4] Multimodal Sentiment Analysis Using Deep Learning
    Sharma, Rakhee
    Le Ngoc Tan
    Sadat, Fatiha
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1475 - 1478
  • [5] Prompt Link Multimodal Fusion in Multimodal Sentiment Analysis
    Zhu, Kang
    Fan, Cunhang
    Tao, Jianhua
    Lv, Zhao
    INTERSPEECH 2024, 2024, : 4668 - 4672
  • [6] Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors
    Wu, Yang
    Zhao, Yanyan
    Yang, Hao
    Chen, Song
    Qin, Bing
    Cao, Xiaohuan
    Zhao, Wenting
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1397 - 1406
  • [7] Sentiment-aware multimodal pre-training for multimodal sentiment analysis
    Ye, Junjie
    Zhou, Jie
    Tian, Junfeng
    Wang, Rui
    Zhou, Jingyi
    Gui, Tao
    Zhang, Qi
    Huang, Xuanjing
    KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [8] A Survey on Multimodal Sentiment Analysis
    Zhang Y.
    Rong L.
    Song D.
    Zhang P.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2020, 33 (05): : 426 - 438
  • [9] A survey of multimodal sentiment analysis
    Soleymani, Mohammad
    Garcia, David
    Jou, Brendan
    Schuller, Bjoern
    Chang, Shih-Fu
    Pantic, Maja
    IMAGE AND VISION COMPUTING, 2017, 65 : 3 - 14
  • [10] Multimodal Blockwise Transformer for Robust Sentiment Recognition
    Lai, Zhengqin
    Hong, Xiaopeng
    Wang, Yabin
    PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMODAL AND RESPONSIBLE AFFECTIVE COMPUTING, MRAC 2024, 2024, : 88 - 92