Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism

被引：8

作者：

Li, Hongchan ^{[1
]}

Lu, Yantong ^{[1
]}

Zhu, Haodong ^{[1
]}

机构：

[1] Zhengzhou Univ Light Ind, Sch Comp Sci & Technol, Zhengzhou 450002, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 11期

关键词：

multi-modal sentiment analysis; ALBert; CBAM; DenseNet121; deep learning; feature extraction; CLASSIFICATION;

D O I：

10.3390/electronics13112069

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Research on uni-modal sentiment analysis has achieved great success, but emotions in real life are mostly multi-modal; there are not only texts but also images, audio, video, and other forms. The various modes play a role in mutual promotion. If the connection between various modalities can be mined, the accuracy of sentiment analysis will be further improved. To this end, this paper introduces a cross-attention-based multi-modal fusion model for images and text, namely, MCAM. First, we use the ALBert pre-training model to extract text features for text; then, we use BiLSTM to extract text context features; then, we use DenseNet121 to extract image features for images; and then, we use CBAM to extract specific areas related to emotion in images. Finally, we utilize multi-modal cross-attention to fuse the extracted features from the text and image, and we classify the output to determine the emotional polarity. In the experimental comparative analysis of MVSA and TumEmo public datasets, the model in this article is better than the baseline model, with accuracy and F1 scores reaching 86.5% and 75.3% and 85.5% and 76.7%, respectively. In addition, we also conducted ablation experiments, which confirmed that sentiment analysis with multi-modal fusion is better than single-modal sentiment analysis.

引用

页数：22

共 42 条

[1]

BorthD Ji R, 2013, P 21 ACM INT C MULTI, P223, DOI [10.1145/2502081.2502282, DOI 10.1145/2502081.2502282]

[2]

Cai Z, 2015, Arxiv, DOI arXiv:1506.00765

[3] A cross-media public sentiment analysis system for microblog [J].

Cao, Donglin ;

Ji, Rongrong ;

Lin, Dazhen ;

Li, Shaozi .

MULTIMEDIA SYSTEMS, 2016, 22 (04) :479-486

[4] Investigating the emotional experiences in eSports spectatorship: The case of League of Legends [J].

Cauteruccio, Francesco ;

Kou, Yubo .

INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)

[5]

Chen MH, 2017, PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2017, P163, DOI 10.1145/3136755.3136801

[6]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[7]

Goel A, 2016, PROCEEDINGS ON 2016 2ND INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING TECHNOLOGIES (NGCT), P257, DOI 10.1109/NGCT.2016.7877424

[8]

He X., 2019, P 2019 INT JOINT C N, P1

[9]

Hoang M, 2019, P 22 NORD C COMP LIN, P187

[10]

Hu R., 2018, P 2018 IEEE 4 INT C

← 1 2 3 4 5 →