Scanning, attention, and reasoning multimodal content for sentiment analysis

被引：5

作者：

Liu, Yun ^{[1
]}

Li, Zhoujun ^{[2
]}

Zhou, Ke ^{[1
]}

Zhang, Leilei ^{[1
]}

Li, Lang ^{[1
]}

Tian, Peng ^{[1
]}

Shen, Shixun ^{[1
]}

机构：

[1] Moutai Inst, Dept Automat, Renhuai 564507, Guizhou Provinc, Peoples R China

[2] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2023年 / 268卷

基金：

中国国家自然科学基金;

关键词：

Multimodal sentiment analysis; Attention; Reasoning; FUSION;

D O I：

10.1016/j.knosys.2023.110467

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The rise of social networks has provided people with platforms to display their lives and emotions, often in multimodal forms such as images and descriptive texts. Capturing the emotions embedded in the multimodal content of social networks involves great research challenges and practical values. Existing methods usually make sentiment predictions based on a single-round reasoning process with multimodal attention networks, however, this may be insufficient for tasks that require deep understanding and complex reasoning. To effectively comprehend multimodal content and predict the correct sentiment tendencies, we propose the Scanning, Attention, and Reasoning (SAR) model for multimodal sentiment analysis. Specifically, a perceptual scanning model is designed to roughly perceive the image and text content, as well as the intrinsic correlation between them. To deeply understand the complementary features between images and texts, an intensive attention model is proposed for cross-modal feature association learning. The multimodal joint features from the scanning and attention models are fused together as the representation of a multimodal node in the social network. A heterogeneous reasoning model implemented with a graph neural network is constructed to capture the influence of network communication in social networks and make sentiment predictions. Extensive experiments conducted on three benchmark datasets confirm the effectiveness and superiority of our model compared with state-of-the-art methods.(c) 2023 Elsevier B.V. All rights reserved.

引用

页数：11

共 54 条

[1] Borth D., 2013, P 21 ACM INT C MULT, P223, DOI [DOI 10.1145/2502081.2502282, 10.1145/2502081.2502282]
[2] A cross-media public sentiment analysis system for microblog
Cao, Donglin
Ji, Rongrong
Lin, Dazhen
Li, Shaozi
[J]. MULTIMEDIA SYSTEMS, 2016, 22 (04) : 479 - 486
[3] Chen F, 2015, 2015 IEEE 6TH INTERNATIONAL SYMPOSIUM ON MICROWAVE, ANTENNA, PROPAGATION, AND EMC TECHNOLOGIES (MAPE), P1, DOI 10.1109/MAPE.2015.7510253
[4] Cho KYHY, 2014, Arxiv, DOI [arXiv:1406.1078, DOI 10.48550/ARXIV.1406.1078]
[5] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6] Gated attention fusion network for multimodal sentiment classification
Du, Yongping
Liu, Yang
Peng, Zhi
Jin, Xingnan
[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 240
[7] A comparison of alternative tests of significance for the problem of m rankings
Friedman, M
[J]. ANNALS OF MATHEMATICAL STATISTICS, 1940, 11 : 86 - 92
[8] Gao HY, 2015, ADV NEUR IN, V28
[9] Ghosal D, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3454
[10] MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis
Hazarika, Devamanyu
Zimmermann, Roger
Poria, Soujanya
[J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1122 - 1131

← 1 2 3 4 5 6 →