Image-text sentiment analysis via deep multimodal attentive fusion

被引：184

作者：

Huang, Feiran ^{[1
]}

Zhang, Xiaoming ^{[2
]}

Zhao, Zhonghua ^{[3
]}

Xu, Jie ^{[1
]}

Li, Zhoujun ^{[1
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China

[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing 100191, Peoples R China

[3] Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Beijing 100029, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2019年 / 167卷

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

Multimodal learning; Sentiment analysis; Attention model; Fusion;

D O I：

10.1016/j.knosys.2019.01.019

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sentiment analysis of social media data is crucial to understand people's position, attitude, and opinion toward a certain event, which has many applications such as election prediction and product evaluation. Though great effort has been devoted to the single modality (image or text), less effort is paid to the joint analysis of multimodal data in social media. Most of the existing methods for multimodal sentiment analysis simply combine different data modalities, which results in dissatisfying performance on sentiment classification. In this paper, we propose a novel image-text sentiment analysis model, i.e., Deep Multimodal Attentive Fusion (DMAF), to exploit the discriminative features and the internal correlation between visual and semantic contents with a mixed fusion framework for sentiment analysis. Specifically, to automatically focus on discriminative regions and important words which are most related to the sentiment, two separate unimodal attention models are proposed to learn effective emotion classifiers for visual and textual modality respectively. Then, an intermediate fusion-based multimodal attention model is proposed to exploit the internal correlation between visual and textual features for joint sentiment classification. Finally, a late fusion scheme is applied to combine the three attention models for sentiment prediction. Extensive experiments are conducted to demonstrate the effectiveness of our approach on both weakly labeled and manually labeled datasets. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：26 / 37

页数：12

共 54 条

[1]

[Anonymous], 2015, ICLR

[2]

[Anonymous], DIS MARKERS

[3]

[Anonymous], 2011, P 25 AAAI C ART INT

[4]

[Anonymous], P 4 INT C WEBL SOC M

[5]

[Anonymous], CHINESE GEOLOGICAL E, DOI DOI 10.1145/2502069.2502079

[6]

[Anonymous], 2014, P INT C INT C MACH L

[7]

[Anonymous], 2018, P 32 AAAI C ART INT

[8]

[Anonymous], 2010, P INT C LANG RES EV

[9]

[Anonymous], CORR

[10]

[Anonymous], CORR

← 1 2 3 4 5 6 →