Multimodal Sentiment Analysis Network Based on Distributional Transformation and Gated Cross-Modal Fusion

被引:0
作者
Zhang, Yuchen [1 ]
Thong, Hong [1 ]
Chen, Guilin [2 ]
Alhusaini, Naji [2 ]
Zhao, Shenghui [2 ]
Wu, Cheng [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Hefei, Peoples R China
[2] Chuzhou Univ, Sch Comp & Informat Engn, Chuzhou, Peoples R China
来源
2024 INTERNATIONAL CONFERENCE ON NETWORKING AND NETWORK APPLICATIONS, NANA 2024 | 2024年
关键词
multimodal sentiment analysis; data fusion; distribution transformation; joint representation;
D O I
10.1109/NaNA63151.2024.00088
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multimodal sentiment analysis aims to synthesize text, audio, and video modalities to extract sentiment information. Existing research focuses on representation learning and feature fusion, but due to inter-modal data distribution differences, fusion models are often difficult to effectively capture inter-modal correlations, especially ignoring unimodal distribution differences, which affects fusion effectiveness. In addition, text features are more important in multimodal sentiment analysis, which increases the challenge of fusing verbal and non-verbal modal information. To address these issues, we propose the Cross-Modal Joint Representation Interaction Network (CMJN), which quantifies the distributional differences between modalities through a Distributed Transformation Layer (DTL) and learns joint representations of verbal and non-verbal using a Gated Cross-Modal Transformer (GCT) to capture inter-modal coherence and complementarity. Experimental results show that CMJN significantly improves multimodal sentiment analysis on CMU-MOSI and CMU-MOSEI datasets.
引用
收藏
页码:496 / 503
页数:8
相关论文
共 30 条
  • [1] Alemi AA, 2019, Arxiv, DOI arXiv:1612.00410
  • [2] Baltrusaitis T, 2016, IEEE WINT CONF APPL
  • [3] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [4] Gandhi A., 2021, 2021 IEEE PUNE SECT, P1
  • [5] Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions
    Gandhi, Ankita
    Adhvaryu, Kinjal
    Poria, Soujanya
    Cambria, Erik
    Hussain, Amir
    [J]. INFORMATION FUSION, 2023, 91 : 424 - 444
  • [6] Dynamically Adjust Word Representations Using Unaligned Multimodal Information
    Guo, Jiwei
    Tang, Jiajia
    Dai, Weichen
    Ding, Yu
    Kong, Wanzeng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3394 - 3402
  • [7] MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis
    Hazarika, Devamanyu
    Zimmermann, Roger
    Poria, Soujanya
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1122 - 1131
  • [8] Jianfeng Wu, 2021, ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction, P521, DOI 10.1145/3462244.3479931
  • [9] Adapt and explore: Multimodal mixup for representation learning
    Lin, Ronghao
    Hu, Haifeng
    [J]. INFORMATION FUSION, 2024, 105
  • [10] Liu Z, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2247