Image-Text Multimodal Sentiment Analysis Framework of Assamese News Articles Using Late Fusion

被引：17

作者：

Das, Ringki ^{[1
]}

Singh, Thoudam Doren ^{[1
]}

机构：

[1] Natl Inst Technol Silchar, Dept Comp Sci & Engn, Silchar 788010, Assam, India

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2023年 / 22卷 / 06期

关键词：

Multimodal sentiment analysis; low resource language; caption generation; machine learning classifier; late fusion;

D O I：

10.1145/3584861

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Before the arrival of the web as a corpus, people detected positive and negative news based on the understanding of the textual content from physical newspaper rather than an automatic identification approach from readily available e-newspapers. Thus, the earlier sentiment analysis approach is based on unimodal data, and less effort is paid to the multimodal data. However, the presence of multimodal information helps us to get a clearer understanding of the sentiment. To the best of our knowledge, less work has been introduced on the image-text multimodal sentiment analysis framework of Assamese, a low-resource Indian language mostly spoken in the northeast part of India. We built an Assamese news articles dataset consisting of news text and associated images and one image caption to conduct an experimental study. Focusing on important words and discriminative regions of the images mostly related to sentiment, two individual unimodal such as textual and visual models are proposed. The visual model is developed using an encoder-decoder-based image caption generation system. An image-text multimodal approach is proposed to explore the internal correlation between textual and visual features for joint sentiment classification. Finally, we propose the multimodal sentiment analysis framework, i.e., Textual Visual Multimodal Fusion, by employing a late fusion scheme to merge the three different modalities for the final sentiment prediction. Experimental results conducted on the Assamese dataset built in-house demonstrate that the contextual integration of multimodal features delivers better performance than unimodal features.

引用

页数：30

共 43 条

[1] Al-Kabi M., 2013, P 4 INT C INFORM COM, P23
[2] Borth D., 2013, P 21 ACM INT C MULT, P223, DOI [10.1145/2502081.2502282, DOI 10.1145/2502081.2502282]
[3] From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction
Campos, Victor
Jou, Brendan
Giro-i-Nieto, Xavier
[J]. IMAGE AND VISION COMPUTING, 2017, 65 : 15 - 22
[4] Visual sentiment topic model based microblog image sentiment analysis
Cao, Donglin
Ji, Rongrong
Lin, Dazhen
Li, Shaozi
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 8955 - 8968
[5] Chen XY, 2017, IEEE IMAGE PROC, P1557, DOI 10.1109/ICIP.2017.8296543
[6] Das Amitava, 2010, INT C COMP PROC OR L, P169
[7] Das Ringki, 2021, Proceedings of the International Conference on Computing and Communication Systems. I3CS 2020, NEHU. Lecture Notes in Networks and Systems (LNNS 170), P15, DOI 10.1007/978-981-33-4084-8_2
[8] Das R., 2021, P 18 INT C NATURAL L, P231
[9] A multi-stage multimodal framework for sentiment analysis of Assamese in low resource setting
Das, Ringki
Singh, Thoudam Doren
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 204
[10] Assamese news image caption generation using attention mechanism
Das, Ringki
Singh, Thoudam Doren
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (07) : 10051 - 10069

← 1 2 3 4 5 →