Aspect Based Sentiment Analysis on Multimodal Data: A Transformer and Low-Rank Fusion Approach

被引：0

作者：

Jin, Meilin ^{[1
]}

Shao, Lianhe ^{[1
]}

Wang, Xihan ^{[1
]}

Yan, Qianqian ^{[1
]}

Chu, Zhulu ^{[1
]}

Luo, Tongtong ^{[1
]}

Tang, Jiacheng ^{[1
]}

Gao, Quanli ^{[1
]}

机构：

[1] Xian Polytech Univ, Sch Comp Sci, Xian, Peoples R China

来源：

2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE, CCAI 2024 | 2024年

关键词：

multimodal sentiment analysis; cross-modal attention mechanism; low-rank fusion;

D O I：

10.1109/CCAI61966.2024.10603022

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video has become the main way for users to share their daily lives and express their opinions, most of which reflect their emotional information. Sentiment analysis of videos can be used to understand user behavior and thus provide better improvements and services. However, due to various reasons, most of the existing sentiment analysis is based on unimodal, and the accuracy of unimodal-based sentiment analysis is low and prone to ambiguity. Some existing multimodal sentiment analysis methods lack effective interaction between modalities. This paper proposes a model called Aspect Based Sentiment Analysis on Multimodal Data: A Transformer and Low-Rank Fusion Approach (ABSA-TLRF). Specifically, ABSA-TLRF utilizes a cross-modal alignment module based on the cross-modal attention mechanism and an efficient method based on low-rank fusion to effectively integrate information within and between modalities. This achieves global-local information interaction, ultimately yielding more accurate emotion fusion results. The model performs sentiment classification, enabling high-level multimodal sentiment analysis. Experimental results indicate that our model outperforms several state-of-the-art methods on three commonly used datasets. Our research suggests that combining a cross-modal alignment module based on the cross-modal attention mechanism and an efficient method based on low-rank fusion can enhance our understanding of multimodal content, thereby improving sentiment analysis performance.

引用

页码：332 / 338

页数：7

共 13 条

[1]

Cambria E, 2017, SOCIO AFFECT COMPUT, V5, P1, DOI 10.1007/978-3-319-55394-8_1

[2]

Dong L, 2014, PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P49

[3]

Jin Q, 2015, INT CONF ACOUST SPEE, P4749, DOI 10.1109/ICASSP.2015.7178872

[4] A survey on sentiment analysis and opinion mining for social multimedia [J].

Li, Zuhe ;

Fan, Yangyu ;

Jiang, Bin ;

Lei, Tao ;

Liu, Weihua .

MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (06) :6939-6967

[5]

Liu Z, 2018, Arxiv, DOI arXiv:1806.00064

[6] Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances [J].

Poria, Soujanya ;

Majumder, Navonil ;

Mihalcea, Rada ;

Hovy, Eduard .

IEEE ACCESS, 2019, 7 :100943-100953

[7] A review of affective computing: From unimodal analysis to multimodal fusion [J].

Poria, Soujanya ;

Cambria, Erik ;

Bajpai, Rajiv ;

Hussain, Amir .

INFORMATION FUSION, 2017, 37 :98-125

[8]

Shutova E., 2016, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, P160, DOI 10.18653/V1/N16-1020

[9] From static to dynamic word representations: a survey [J].

Wang, Yuxuan ;

Hou, Yutai ;

Che, Wanxiang ;

Liu, Ting .

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (07) :1611-1630

[10] A Co-Memory Network for Multimodal Sentiment Analysis [J].

Xu, Nan ;

Mao, Wenji ;

Chen, Guandan .

ACM/SIGIR PROCEEDINGS 2018, 2018, :929-932

← 1 2 →