Hierarchical Interactive Multimodal Transformer for Aspect-Based Multimodal Sentiment Analysis

被引:56
作者
Yu, Jianfei [1 ]
Chen, Kai [1 ]
Xia, Rui [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
关键词
Fine-grained opinion mining; aspect-based sentiment analysis; multimodal sentiment analysis; ATTENTION; NETWORK;
D O I
10.1109/TAFFC.2022.3171091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Aspect-based multimodal sentiment analysis (ABMSA) aims to determine the sentiment polarities of each aspect or entity mentioned in a multimodal post or review. Previous studies to ABMSA can be summarized into two subtasks: aspect-term based multimodal sentiment classification (ATMSC) and aspect-category based multimodal sentiment classification (ACMSC). However, these existing studies have three shortcomings: (1) ignoring the object-level semantics in images; (2) primarily focusing on aspect-text and aspect-image interactions; (3) failing to consider the semantic gap between text and image representations. To tackle these issues, we propose a general Hierarchical Interactive Multimodal Transformer (HIMT) model for ABMSA. Specifically, we extract salient features with semantic concepts from images via an object detection method, and then propose a hierarchical interaction module to first model the aspect-text and aspect-image interactions, followed by capturing the text-image interactions. Moreover, an auxiliary reconstruction module is devised to largely eliminate the semantic gap between text and image representations. Experimental results show that our HIMTmodel significantly outperforms state-of-the-art methods on two benchmarks for ATMSC and one benchmark for ACMSC.
引用
收藏
页码:1966 / 1978
页数:13
相关论文
共 66 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]  
[Anonymous], 2004, P 6 INT C MULT INT I, DOI [DOI 10.1145/1027933.1027968, 10.1145/1027933]
[3]  
[Anonymous], 2011, ICML"
[4]  
[Anonymous], 2017, P 2017 C EMPIRICAL M, DOI [10.18653/v1/D17-1047, DOI 10.18653/V1/D17-1047]
[5]  
Ba JL, 2016, arXiv
[6]  
Borth D., 2013, P ACM INT C MULT, P223, DOI 10.1145/2502081.2502282
[7]  
Cai YT, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2506
[8]   Affective Computing and Sentiment Analysis [J].
Cambria, Erik .
IEEE INTELLIGENT SYSTEMS, 2016, 31 (02) :102-107
[9]  
Castro S, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P4619
[10]   Object-Based Visual Sentiment Concept Analysis and Application [J].
Chen, Tao ;
Yu, Felix X. ;
Chen, Jiawei ;
Cui, Yin ;
Chen, Yan-Ying ;
Chang, Shih-Fu .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :367-376