Multimodal Sentiment Analysis using Deep Learning Fusion Techniques and Transformers

被引:0
作者
Bin Habib, Muhaimin [1 ]
Hafiz, Md. Ferdous Bin [2 ]
Khan, Niaz Ashraf [2 ]
Hossain, Sohrab [1 ]
机构
[1] East Delta Univ, Dept Comp Sci & Engn, Chattogram, Bangladesh
[2] Univ Liberal Arts Bangladesh, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
Multimodal sentiment analysis; deep learning; transfer learning; natural language processing; image processing; BERT;
D O I
10.14569/IJACSA.2024.0150686
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multimodal sentiment analysis extracts sentiments from multiple modalities like text, images, audio, and videos. Most of the current sentiment classifications are based on single modality which is less effective due to simple architecture. This paper studies multimodal sentiment analysis by combining several deep learning text and image processing models. These fusion techniques are RoBERTa with EfficientNet b3, RoBERTa with ResNet50, and BERT with MobileNetV2. This paper focuses on improving sentiment analysis through the combination of text and image data. The performance of each fusion model is carefully analyzed using accuracy, confusion matrices, and ROC curves. The fusion techniques implemented in this study outperformed the previous benchmark models. Notably, the EfficientNet-b3 and RoBERTa combination achieves the highest accuracy (75%) and F1 score (74.9%). This research contributes to the field of sentiment analysis by showing the potential of combining textual and visual data for more accurate sentiment analysis. This will lay the groundwork for researchers in the future to work on multimodal sentiment analysis.
引用
收藏
页码:856 / 863
页数:8
相关论文
共 27 条
[1]   Attention-based multimodal sentiment analysis and emotion recognition using deep neural networks [J].
Aslam, Ajwa ;
Sargano, Allah Bux ;
Habib, Zulfiqar .
APPLIED SOFT COMPUTING, 2023, 144
[2]  
Bai W., 2022, P DEF WORKSH MULT FA
[3]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[4]   Convolutional Neural Networks for Multimedia Sentiment Analysis [J].
Cai, Guoyong ;
Xia, Binbin .
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2015, 2015, 9362 :159-167
[5]  
Chen MH, 2017, PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2017, P163, DOI 10.1145/3136755.3136801
[6]   TEmoX: Classification of Textual Emotion Using Ensemble of Transformers [J].
Das, Avishek ;
Hoque, Mohammed Moshiul ;
Sharif, Omar ;
Dewan, M. Ali Akber ;
Siddique, Nazmul .
IEEE ACCESS, 2023, 11 :109803-109818
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]   Multimodal Emotion Recognition with Deep Learning: Advancements, challenges, and future directions [J].
Geetha, A., V ;
Mala, T. ;
Priyanka, D. ;
Uma, E. .
INFORMATION FUSION, 2024, 105
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]   TeFNA: Text-centered fusion network with crossmodal attention for multimodal sentiment analysis [J].
Huang, Changqin ;
Zhang, Junling ;
Wu, Xuemei ;
Wang, Yi ;
Li, Ming ;
Huang, Xiaodi .
KNOWLEDGE-BASED SYSTEMS, 2023, 269