Disentanglement Translation Network for multimodal sentiment analysis

被引:30
作者
Zeng, Ying [1 ]
Yan, Wenjun [1 ]
Mai, Sijie [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Peoples R China
关键词
Mutlimodal representation learning; Disentanglement learning; Multimodal sentiment analysis; Feature reconstruction; FUSION;
D O I
10.1016/j.inffus.2023.102031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Obtaining an effective joint representation has always been the goal for multimodal tasks. However, distri-butional gap inevitably exists due to the heterogeneous nature of different modalities, which poses burden on the fusion process and the learning of multimodal representation. The imbalance of modality dominance further aggravates this problem, where inferior modalities may contain much redundancy that introduces additional variations. To address the aforementioned issues, we propose a Disentanglement Translation Network (DTN) with Slack Reconstruction to capture desirable information properties, obtain a unified feature distribution and reduce redundancy. Specifically, the encoder-decoder-based disentanglement framework is adopted to decouple the unimodal representations into modality-common and modality-specific subspaces, which explores the cross-modal commonality and diversity, respectively. In the encoding stage, to narrow down the discrepancy, a two-stage translation is devised to incorporate with the disentanglement learning framework. The first stage targets at learning modality-invariant embedding for modality-common information with adversarial learning strategy, capturing the commonality shared across modalities. The second stage considers the modality-specific information that reveals diversity. To relieve the burden of multimodal fusion, we realize Specific-Common Distribution Matching to further unify the distribution of the desirable information. As for the decoding and reconstruction stage, we propose Slack Reconstruction to seek a balance between retaining discriminative information and reducing redundancy. Although the existing commonly-used reconstruction loss with strict constraint lowers the risk of information loss, it easily leads to the preservation of information redundancy. In contrast, Slack Reconstruction imposes a more relaxed constraint so that the redundancy is not forced to be retained, and simultaneously explores the inter-sample relationships. The proposed method aids multimodal fusion by learning the exact properties and obtaining a more uniform distribution for cross -modal data, and manages to reduce information redundancy to further ensure feature effectiveness. Extensive experiments on the task of multimodal sentiment analysis indicate the effectiveness of the proposed method. The codes are available at https://github.com/zengy268/DTN.
引用
收藏
页数:12
相关论文
共 58 条
[21]   A Unimodal Representation Learning and Recurrent Decomposition Fusion Structure for Utterance-Level Multimodal Embedding Learning [J].
Mai, Sijie ;
Hu, Haifeng ;
Xing, Songlong .
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :2488-2501
[22]  
Mai SJ, 2020, AAAI CONF ARTIF INTE, V34, P164
[23]   Deep Multimodal Fusion for Persuasiveness Prediction [J].
Nojavanasghari, Behnaz ;
Gopinath, Deepak ;
Koushik, Jayanth ;
Baltrusaitis, Tadas ;
Morency, Louis-Philippe .
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :284-288
[24]   Deep Multimodal Learning for Affective Analysis and Retrieval [J].
Pang, Lei ;
Zhu, Shiai ;
Ngo, Chong-Wah .
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) :2008-2020
[25]  
Pham H, 2019, AAAI CONF ARTIF INTE, P6892
[26]   Context-Dependent Sentiment Analysis in User-Generated Videos [J].
Poria, Soujanya ;
Cambria, Erik ;
Hazarika, Devamanyu ;
Mazumder, Navonil ;
Zadeh, Amir ;
Morency, Louis-Philippe .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :873-883
[27]   A review of affective computing: From unimodal analysis to multimodal fusion [J].
Poria, Soujanya ;
Cambria, Erik ;
Bajpai, Rajiv ;
Hussain, Amir .
INFORMATION FUSION, 2017, 37 :98-125
[28]  
Poria S, 2016, IEEE DATA MINING, P439, DOI [10.1109/ICDM.2016.178, 10.1109/ICDM.2016.0055]
[29]  
Rahman W, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P2359, DOI 10.18653/v1/2020.acl-main.214
[30]  
Rozgic V, 2012, ASIAPAC SIGN INFO PR