Multi-Level Attention Map Network for Multimodal Sentiment Analysis

被引：48

作者：

Xue, Xiaojun ^{[1
]}

Zhang, Chunxia ^{[1
]}

Niu, Zhendong ^{[1
]}

Wu, Xindong ^{[2
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China

[2] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ China, Hefei 230009, Anhui, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2023年 / 35卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Task analysis; Sentiment analysis; Fuses; Visualization; User-generated content; Data mining; Multimodal sentiment analysis; opinion mining; social analysis; multimodal fusion;

D O I：

10.1109/TKDE.2022.3155290

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal sentiment analysis (MSA) is a very challenging task due to its complex and complementary interactions between multiple modalities, which can be widely applied into areas of product marketing, public opinion monitoring, and so on. However, previous works directly utilized the features extracted from multimodal data, in which the noise reduction within and among multiple modalities has been largely ignored before multimodal fusion. This paper proposes a multi-level attention map network (MAMN) to filter noise before multimodal fusion and capture the consistent and heterogeneous correlations among multi-granularity features for multimodal sentiment analysis. Architecturally, MAMN is comprised of three modules: multi-granularity feature extraction module, multi-level attention map generation module, and attention map fusion module. The first module is designed to sufficiently extract multi-granularity features from multimodal data. The second module is constructed to filter noise and enhance the representation ability for multi-granularity features before multimodal fusion. And the third module is built to extensibly mine the interactions among multi-level attention maps by the proposed extensible co-attention fusion method. Extensive experimental results on three public datasets show the proposed model is significantly superior to the state-of-the-art methods, and demonstrate its effectiveness on two tasks of document-based and aspect-based MSA tasks.

引用

页码：5105 / 5118

页数：14

共 51 条

[1]

[Anonymous], 2010, INT C COMP LING COLI

[2]

[Anonymous], 2017, P 2017 C EMPIRICAL M, DOI [10.18653/v1/D17-1047, DOI 10.18653/V1/D17-1047]

[3]

[Anonymous], 2018, Proceedings of International Conference on Image and Vision Computing New Zealand (IVCNZ)

[4]

Baccianella S, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

[5] A multimodal feature learning approach for sentiment analysis of social network multimedia [J].

Baecchi, Claudio ;

Uricchio, Tiberio ;

Bertini, Marco ;

Del Bimbo, Alberto .

MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (05) :2507-2525

[6]

Borth D., 2013, P ACM INT C MULT, P223, DOI 10.1145/2502081.2502282

[7] Convolutional Neural Networks for Multimedia Sentiment Analysis [J].

Cai, Guoyong ;

Xia, Binbin .

NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2015, 2015, 9362 :159-167

[8]

Chauhan DS, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4351

[9]

Delbrouck JB, 2020, PROCEEDINGS OF THE SECOND GRAND CHALLENGE AND WORKSHOP ON MULTIMODAL LANGUAGE (CHALLENGE-HML), VOL 1, P1

[10]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 4 5 6 →