DRLN: Disentangled Representation Learning Network for Multimodal Sentiment Analysis

被引:0
作者
Hou, Jingming [1 ]
Omar, Nazlia [1 ]
Tiun, Sabrina [1 ]
Saad, Saidah [1 ]
He, Qian [2 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Ctr Artificial Intelligence Technol, Bangi 43650, Selangor, Malaysia
[2] Guilin Univ Elect Technol, State & Local Joint Engn Res Ctr Satellite Nav &, Guangxi Key Lab Cryptog & Informat Secur, Guilin, Peoples R China
来源
NEURAL COMPUTING FOR ADVANCED APPLICATIONS, NCAA 2024, PT III | 2025年 / 2183卷
基金
中国国家自然科学基金;
关键词
Multimodal Sentiment Analysis; Multimodal Representation Learning; Contrastive Learning; Text-Centric;
D O I
10.1007/978-981-97-7007-6_11
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multimodal sentiment analysis (MSA) involves the analysis of human emotions by integrating emotional features from text, audio, and visual modalities. Previously, a considerable amount of research had commonly focused on fusing these three modalities equally, neglecting the importance of text modality and the influence of modal heterogeneity on fusion. To address this issue, we propose the Disentangled Representation Learning Network (DRLN) which decomposes representations from different modalities into separate sub-spaces centered around text modality to capture similarity and dissimilarity features among modalities. Meanwhile, we utilize contrastive learning loss and reconstruction loss to better learn representations within each sub-space. Finally, extensive experiments on two benchmark datasets demonstrate that our model outperforms state-of-the-art methods on various metrics.
引用
收藏
页码:148 / 161
页数:14
相关论文
共 31 条
[1]   BERT: a sentiment analysis odyssey [J].
Alaparthi, Shivaji ;
Mishra, Manit .
JOURNAL OF MARKETING ANALYTICS, 2021, 9 (02) :118-126
[2]  
Brown TB, 2020, ADV NEUR IN, V33
[3]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]
[4]   Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions [J].
Gandhi, Ankita ;
Adhvaryu, Kinjal ;
Poria, Soujanya ;
Cambria, Erik ;
Hussain, Amir .
INFORMATION FUSION, 2023, 91 :424-444
[5]  
Han W., 2021, arXiv, DOI [10.48550/arXiv.2109.00412, DOI 10.48550/ARXIV.2109.00412]
[6]   MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis [J].
Hazarika, Devamanyu ;
Zimmermann, Roger ;
Poria, Soujanya .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :1122-1131
[7]  
Hu G., 2022, arXiv
[8]   TeFNA: Text-centered fusion network with crossmodal attention for multimodal sentiment analysis [J].
Huang, Changqin ;
Zhang, Junling ;
Wu, Xuemei ;
Wang, Yi ;
Li, Ming ;
Huang, Xiaodi .
KNOWLEDGE-BASED SYSTEMS, 2023, 269
[9]  
Hwang Y, 2023, 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, P35
[10]   Progressive Modality Reinforcement for Human Multimodal Emotion Recognition from Unaligned Multimodal Sequences [J].
Lv, Fengmao ;
Chen, Xiang ;
Huang, Yanyong ;
Duan, Lixin ;
Lin, Guosheng .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :2554-2562