Multimodal Sentiment Analysis Network Based on Distributional Transformation and Gated Cross-Modal Fusion

被引：0

作者：

Zhang, Yuchen ^{[1
]}

Thong, Hong ^{[1
]}

Chen, Guilin ^{[2
]}

Alhusaini, Naji ^{[2
]}

Zhao, Shenghui ^{[2
]}

Wu, Cheng ^{[1
]}

机构：

[1] Anhui Univ, Sch Comp Sci & Technol, Hefei, Peoples R China

[2] Chuzhou Univ, Sch Comp & Informat Engn, Chuzhou, Peoples R China

来源：

2024 INTERNATIONAL CONFERENCE ON NETWORKING AND NETWORK APPLICATIONS, NANA 2024 | 2024年

关键词：

multimodal sentiment analysis; data fusion; distribution transformation; joint representation;

D O I：

10.1109/NaNA63151.2024.00088

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multimodal sentiment analysis aims to synthesize text, audio, and video modalities to extract sentiment information. Existing research focuses on representation learning and feature fusion, but due to inter-modal data distribution differences, fusion models are often difficult to effectively capture inter-modal correlations, especially ignoring unimodal distribution differences, which affects fusion effectiveness. In addition, text features are more important in multimodal sentiment analysis, which increases the challenge of fusing verbal and non-verbal modal information. To address these issues, we propose the Cross-Modal Joint Representation Interaction Network (CMJN), which quantifies the distributional differences between modalities through a Distributed Transformation Layer (DTL) and learns joint representations of verbal and non-verbal using a Gated Cross-Modal Transformer (GCT) to capture inter-modal coherence and complementarity. Experimental results show that CMJN significantly improves multimodal sentiment analysis on CMU-MOSI and CMU-MOSEI datasets.

引用

页码：496 / 503

页数：8

共 30 条

[1] Alemi AA, 2019, Arxiv, DOI arXiv:1612.00410
[2] Baltrusaitis T, 2016, IEEE WINT CONF APPL
[3] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[4] Gandhi A., 2021, 2021 IEEE PUNE SECT, P1
[5] Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions
Gandhi, Ankita
Adhvaryu, Kinjal
Poria, Soujanya
Cambria, Erik
Hussain, Amir
[J]. INFORMATION FUSION, 2023, 91 : 424 - 444
[6] Dynamically Adjust Word Representations Using Unaligned Multimodal Information
Guo, Jiwei
Tang, Jiajia
Dai, Weichen
Ding, Yu
Kong, Wanzeng
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3394 - 3402
[7] MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis
Hazarika, Devamanyu
Zimmermann, Roger
Poria, Soujanya
[J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1122 - 1131
[8] Jianfeng Wu, 2021, ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction, P521, DOI 10.1145/3462244.3479931
[9] Adapt and explore: Multimodal mixup for representation learning
Lin, Ronghao
Hu, Haifeng
[J]. INFORMATION FUSION, 2024, 105
[10] Liu Z, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2247

← 1 2 3 →