Scanning, attention, and reasoning multimodal content for sentiment analysis

被引:5
作者
Liu, Yun [1 ]
Li, Zhoujun [2 ]
Zhou, Ke [1 ]
Zhang, Leilei [1 ]
Li, Lang [1 ]
Tian, Peng [1 ]
Shen, Shixun [1 ]
机构
[1] Moutai Inst, Dept Automat, Renhuai 564507, Guizhou Provinc, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal sentiment analysis; Attention; Reasoning; FUSION;
D O I
10.1016/j.knosys.2023.110467
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rise of social networks has provided people with platforms to display their lives and emotions, often in multimodal forms such as images and descriptive texts. Capturing the emotions embedded in the multimodal content of social networks involves great research challenges and practical values. Existing methods usually make sentiment predictions based on a single-round reasoning process with multimodal attention networks, however, this may be insufficient for tasks that require deep understanding and complex reasoning. To effectively comprehend multimodal content and predict the correct sentiment tendencies, we propose the Scanning, Attention, and Reasoning (SAR) model for multimodal sentiment analysis. Specifically, a perceptual scanning model is designed to roughly perceive the image and text content, as well as the intrinsic correlation between them. To deeply understand the complementary features between images and texts, an intensive attention model is proposed for cross-modal feature association learning. The multimodal joint features from the scanning and attention models are fused together as the representation of a multimodal node in the social network. A heterogeneous reasoning model implemented with a graph neural network is constructed to capture the influence of network communication in social networks and make sentiment predictions. Extensive experiments conducted on three benchmark datasets confirm the effectiveness and superiority of our model compared with state-of-the-art methods.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 54 条
  • [1] Borth D., 2013, P 21 ACM INT C MULT, P223, DOI [DOI 10.1145/2502081.2502282, 10.1145/2502081.2502282]
  • [2] A cross-media public sentiment analysis system for microblog
    Cao, Donglin
    Ji, Rongrong
    Lin, Dazhen
    Li, Shaozi
    [J]. MULTIMEDIA SYSTEMS, 2016, 22 (04) : 479 - 486
  • [3] Chen F, 2015, 2015 IEEE 6TH INTERNATIONAL SYMPOSIUM ON MICROWAVE, ANTENNA, PROPAGATION, AND EMC TECHNOLOGIES (MAPE), P1, DOI 10.1109/MAPE.2015.7510253
  • [4] Cho KYHY, 2014, Arxiv, DOI [arXiv:1406.1078, DOI 10.48550/ARXIV.1406.1078]
  • [5] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [6] Gated attention fusion network for multimodal sentiment classification
    Du, Yongping
    Liu, Yang
    Peng, Zhi
    Jin, Xingnan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 240
  • [7] A comparison of alternative tests of significance for the problem of m rankings
    Friedman, M
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1940, 11 : 86 - 92
  • [8] Gao HY, 2015, ADV NEUR IN, V28
  • [9] Ghosal D, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3454
  • [10] MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis
    Hazarika, Devamanyu
    Zimmermann, Roger
    Poria, Soujanya
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1122 - 1131