Hierarchical Interactive Multimodal Transformer for Aspect-Based Multimodal Sentiment Analysis

被引：56

作者：

Yu, Jianfei ^{[1
]}

Chen, Kai ^{[1
]}

Xia, Rui ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

来源：

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING | 2023年 / 14卷 / 03期

关键词：

Fine-grained opinion mining; aspect-based sentiment analysis; multimodal sentiment analysis; ATTENTION; NETWORK;

D O I：

10.1109/TAFFC.2022.3171091

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Aspect-based multimodal sentiment analysis (ABMSA) aims to determine the sentiment polarities of each aspect or entity mentioned in a multimodal post or review. Previous studies to ABMSA can be summarized into two subtasks: aspect-term based multimodal sentiment classification (ATMSC) and aspect-category based multimodal sentiment classification (ACMSC). However, these existing studies have three shortcomings: (1) ignoring the object-level semantics in images; (2) primarily focusing on aspect-text and aspect-image interactions; (3) failing to consider the semantic gap between text and image representations. To tackle these issues, we propose a general Hierarchical Interactive Multimodal Transformer (HIMT) model for ABMSA. Specifically, we extract salient features with semantic concepts from images via an object detection method, and then propose a hierarchical interaction module to first model the aspect-text and aspect-image interactions, followed by capturing the text-image interactions. Moreover, an auxiliary reconstruction module is devised to largely eliminate the semantic gap between text and image representations. Experimental results show that our HIMTmodel significantly outperforms state-of-the-art methods on two benchmarks for ATMSC and one benchmark for ACMSC.

引用

页码：1966 / 1978

页数：13

共 66 条

[1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].

Anderson, Peter ;

He, Xiaodong ;

Buehler, Chris ;

Teney, Damien ;

Johnson, Mark ;

Gould, Stephen ;

Zhang, Lei .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086

[2]

[Anonymous], 2004, P 6 INT C MULT INT I, DOI [DOI 10.1145/1027933.1027968, 10.1145/1027933]

[3]

[Anonymous], 2011, ICML"

[4]

[Anonymous], 2017, P 2017 C EMPIRICAL M, DOI [10.18653/v1/D17-1047, DOI 10.18653/V1/D17-1047]

[5]

Ba JL, 2016, arXiv

[6]

Borth D., 2013, P ACM INT C MULT, P223, DOI 10.1145/2502081.2502282

[7]

Cai YT, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2506

[8] Affective Computing and Sentiment Analysis [J].

Cambria, Erik .

IEEE INTELLIGENT SYSTEMS, 2016, 31 (02) :102-107

[9]

Castro S, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P4619

[10] Object-Based Visual Sentiment Concept Analysis and Application [J].

Chen, Tao ;

Yu, Felix X. ;

Chen, Jiawei ;

Cui, Yin ;

Chen, Yan-Ying ;

Chang, Shih-Fu .

PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :367-376

← 1 2 3 4 5 6 7 →