An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection

被引：0

作者：

Ayetiran, Eniafe Festus ^{[1
,2
]}

Özgöbek, Özlem ^{[1
]}

机构：

[1] Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway

[2] Department of Computer Science, Achievers University, Nigeria

来源：

Information Systems | 2024年 / 123卷

关键词：

Deep neural networks - Fake detection - Multilayer neural networks - Speech recognition;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Fake news, hate speech and offensive language are related evil triplets currently affecting modern societies. Text modality for the computational detection of these phenomena has been widely used. In recent times, multimodal studies in this direction are attracting a lot of interests because of the potentials offered by other modalities in contributing to the detection of these menaces. However, a major problem in multimodal content understanding is how to effectively model the complementarity of the different modalities due to their diverse characteristics and features. From a multimodal point of view, the three tasks have been studied mainly using image and text modalities. Improving the effectiveness of the diverse multimodal approaches is still an open research topic. In addition to the traditional text and image modalities, we consider image–texts which are rarely used in previous studies but which contain useful information for enhancing the effectiveness of a prediction model. In order to ease multimodal content understanding and enhance prediction, we leverage recent advances in computer vision and deep learning for these tasks. First, we unify the modalities by creating a text representation of the images and image–texts, in addition to the main text. Secondly, we propose a multi-layer deep neural network with inter-modal attention mechanism to model the complementarity among these modalities. We conduct extensive experiments involving three standard datasets covering the three tasks. Experimental results show that detection of fake news, hate speech and offensive language can benefit from this approach. Furthermore, we conduct robust ablation experiments to show the effectiveness of our approach. Our model predominantly outperforms prior works across the datasets. © 2024 The Author(s)

引用