TEDT: Transformer-Based Encoding–Decoding Translation Network for Multimodal Sentiment Analysis

被引:0
作者
Fan Wang
Shengwei Tian
Long Yu
Jing Liu
Junwen Wang
Kun Li
Yongtao Wang
机构
[1] University of Xinjiang,School of Software
[2] University of Xinjiang,Network and Information Center
来源
Cognitive Computation | 2023年 / 15卷
关键词
Multimodal sentiment analysis; Transformer; Multimodal fusion; Multimodal attention;
D O I
暂无
中图分类号
学科分类号
摘要
Multimodal sentiment analysis is a popular and challenging research topic in natural language processing, but the impact of individual modal data in videos on sentiment analysis results can be different. In the temporal dimension, natural language sentiment is influenced by nonnatural language sentiment, which may enhance or weaken the original sentiment of the current natural language. In addition, there is a general problem of poor quality of nonnatural language features, which essentially hinders the effect of multimodal fusion. To address the above issues, we proposed a multimodal encoding–decoding translation network with a transformer and adopted a joint encoding–decoding method with text as the primary information and sound and image as the secondary information. To reduce the negative impact of nonnatural language data on natural language data, we propose a modality reinforcement cross-attention module to convert nonnatural language features into natural language features to improve their quality and better integrate multimodal features. Moreover, the dynamic filtering mechanism filters out the error information generated in the cross-modal interaction to further improve the final output. We evaluated the proposed method on two multimodal sentiment analysis benchmark datasets (MOSI and MOSEI), and the accuracy of the method was 89.3% and 85.9%, respectively. In addition, our method outperformed the current state-of-the-art methods. Our model can greatly improve the effect of multimodal fusion and more accurately analyze human sentiment.
引用
收藏
页码:289 / 303
页数:14
相关论文
共 88 条
  • [1] Vinodhini G(2012)Sentiment analysis and opinion mining: a survey Int J 2 282-292
  • [2] Chandrasekaran RM(2019)Multimodal machine learning: a survey and taxonomy IEEE Trans Pattern Anal Mach Intell 41 423-443
  • [3] Baltrušaitis T(2018)Multimodal sentiment analysis using hierarchical fusion with context modeling Knowl Based Syst 161 124-133
  • [4] Ahuja C(2016)A multimodal feature learning approach for sentiment analysis of social network multimedia Multimed Tools Appl 75 2507-2525
  • [5] Morency L-P(2021)Deep learning-based late fusion of multimodal information for emotion classification of music video Multimed Tools Appl 80 2887-2905
  • [6] Majumder N(2017)A review of affective computing: from unimodal analysis to multimodal fusion Inform Fusion 37 98-125
  • [7] Hazarika D(2018)Memory fusion network for multi-view sequential learning Proc AAAI Conf Artif Intell 32 5634-5641
  • [8] Gelbukh A(2018)Memory fusion network for multi-view sequential learning Proc AAAI Conf Artif Intell 32 5634-5641
  • [9] Cambria E(2021)Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis Proc AAAI Conf Artif Intell 35 10790-10797
  • [10] Poria S(2021)Quantum-inspired multimodal fusion for video sentiment analysis Inf Fusion 65 58-71