Disentangled variational auto-encoder for multimodal fusion performance analysis in multimodal sentiment analysis

被引:0
|
作者
Chen, Rongfei [1 ]
Zhou, Wenju [1 ]
Hu, Huosheng [2 ]
Fei, Zixiang [3 ]
Fei, Minrui [1 ]
Zhou, Hao [4 ]
机构
[1] Shanghai Univ, Sch Mechatron Engn & Automat, Shanghai Key Lab Power Stn Automat Technol, Shanghai 200444, Peoples R China
[2] Univ Essex, Sch Comp Sci & Elect Engn, Colchester CO4 3SQ, England
[3] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[4] Univ Oxford, Dept Comp Sci, Oxford OX1 2JD, England
关键词
Multimodal sentiment analysis; Model performance evaluation; Disentangled representation learning; EXPLAINABILITY;
D O I
10.1016/j.knosys.2024.112372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal Sentiment Analysis (MSA) holds extensive applicability owing to its capacity to analyze and interpret users' emotions, feelings, and perspectives by integrating complementary information from multiple modalities. However, inefficient and unbalanced cross-modal information fusion substantially undermines the accuracy and reliability of MSA models. Consequently, a critical challenge in the field now lies in effectively assessing the information integration capabilities of these models to ensure balanced and equitable processing of multimodal data. In this paper, a Disentanglement-based Variable Auto-Encoder (DVAE) is proposed for systematically assessing fusion performance and investigating the factors that facilitate multimodal fusion. Specifically, a distribution constraint module is presented to decouple the fusion matrices and generate multiple low-dimensional and trustworthy disentangled latent vectors that adhere to the authentic unimodal input distribution. In addition, a combined loss term is modified to effectively balance inductive bias, signal reconstruction, and distribution constraint items to facilitate the optimization of neural network weights and parameters. Utilizing the proposed evaluation method, we can evaluate the fusion performance of multimodal models by contrasting the classification degradation ratio derived from disentangled hidden representations and joint representations. Experiments conducted with eight state-of-the-art multimodal fusion methods on the CMU-MOSEI and CMU-MOSEI benchmark datasets demonstrate that DVAE is capable of effectively evaluating the effects of multimodal fusion. Moreover, the comparative experimental results indicate that the equalizing effect among various advanced mechanisms in multimodal sentiment analysis, as well as the single-peak characteristic of the ground label distribution, both contribute significantly to multimodal data fusion.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Attention fusion network for multimodal sentiment analysis
    Yuanyi Luo
    Rui Wu
    Jiafeng Liu
    Xianglong Tang
    Multimedia Tools and Applications, 2024, 83 : 8207 - 8217
  • [22] Attention fusion network for multimodal sentiment analysis
    Luo, Yuanyi
    Wu, Rui
    Liu, Jiafeng
    Tang, Xianglong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8207 - 8217
  • [23] Learning Disentangled Representation for Multimodal Cross-Domain Sentiment Analysis
    Zhang, Yuhao
    Zhang, Ying
    Guo, Wenya
    Cai, Xiangrui
    Yuan, Xiaojie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7956 - 7966
  • [24] Trustworthy Multimodal Fusion for Sentiment Analysis in Ordinal Sentiment Space
    Xie, Zhuyang
    Yang, Yan
    Wang, Jie
    Liu, Xiaorong
    Li, Xiaofan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7657 - 7670
  • [25] Multimodal Brain Growth Patterns: Insights from Canonical Correlation Analysis and Deep Canonical Correlation Analysis with Auto-Encoder
    Sapkota, Ram
    Thapaliya, Bishal
    Ray, Bhaskar
    Suresh, Pranav
    Liu, Jingyu
    Information (Switzerland), 2025, 16 (03)
  • [26] Serendipity adjustable application recommendation via joint disentangled recurrent variational auto-encoder
    Lee, Younghoon
    ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2020, 44
  • [27] A transformer-encoder-based multimodal multi-attention fusion network for sentiment analysis
    Liu, Cong
    Wang, Yong
    Yang, Jing
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8415 - 8441
  • [28] Automating safety critical ultrasonic data analysis with a variational auto-encoder
    Torenvliet, Nick
    Liu, Yizhe
    Zelek, John
    2023 IEEE SENSORS APPLICATIONS SYMPOSIUM, SAS, 2023,
  • [29] Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis
    Han, Wei
    Chen, Hui
    Poria, Soujanya
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9180 - 9192
  • [30] Fusion-Extraction Network for Multimodal Sentiment Analysis
    Jiang, Tao
    Wang, Jiahai
    Liu, Zhiyue
    Ling, Yingbiao
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 785 - 797