Multimodal hate speech detection via multi-scale visual kernels and knowledge distillation architecture

被引：18

作者：

Chhabra, Anusha ^{[1
]}

Vishwakarma, Dinesh Kumar ^{[1
]}

机构：

[1] Delhi Technol Univ, Dept Informat Technol, Biometr Res Lab, Delhi, India

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2023年 / 126卷

关键词：

Hate content; Deep learning; Machine learning; Multimodal; Adaptive receptive field; SENTIMENT ANALYSIS;

D O I：

10.1016/j.engappai.2023.106991

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

People increasingly use social media platforms to express themselves by posting visuals and texts. As a result, hate content is on the rise, necessitating practical visual caption analysis. Thus, the relationship between image and caption modalities is crucial in visual caption analysis. Contrarily, most methods combine features from the image and caption modalities using deep learning architectures with millions of parameters already trained without integrating a specialized attention module, resulting in less desirable outcomes. This paper suggests a novel multi-modal architecture for identifying hateful memetic information in response to the above observation. The proposed architecture contains a novel "multi-scale kernel attentive visual" (MSKAV) module that uses an efficient multi-branch structure to extract discriminative visual features. Additionally, MSKAV utilizes an adaptive receptive field using multi-scale kernels. MSKAV also incorporates a multi-directional visual attention module to highlight spatial regions of importance. The proposed model also contains a novel "knowledge distillation-based attentional caption" (KDAC) module. It uses a transformer-based self-attentive block to extract discriminative features from meme captions. Thorough experimentation on multi-modal hate speech benchmarks MultiOff, Hateful Memes, and MMHS150K datasets achieved accuracy scores of 0.6250, 0.8750, and 0.8078, respectively. It also reaches impressive AUC scores of 0.6557, 0.8363, and 0.7665 on the three datasets, respectively, beating SOTA multi-modal hate speech identification models.

引用

页数：15

共 66 条

[1] Vector based sentiment and emotion analysis from text: A survey [J].

Aka Uymaz, Hande ;

Kumova Metin, Senem .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 113

[2] Web-Informed-Augmented Fake News Detection Model Using Stacked Layers of Convolutional Neural Network and Deep Autoencoder [J].

Ali, Abdullah Marish ;

Ghaleb, Fuad A. ;

Mohammed, Mohammed Sultan ;

Alsolami, Fawaz Jaber ;

Khan, Asif Irshad .

MATHEMATICS, 2023, 11 (09)

[3] Deep Ensemble Fake News Detection Model Using Sequential Deep Learning Technique [J].

Ali, Abdullah Marish ;

Ghaleb, Fuad A. ;

Al-Rimy, Bander Ali Saleh ;

Alsolami, Fawaz Jaber ;

Khan, Asif Irshad .

SENSORS, 2022, 22 (18)

[4] Hate speech detection on Twitter using transfer learning [J].

Ali, Raza ;

Farooq, Umar ;

Arshad, Umair ;

Shahzad, Waseem ;

Beg, Mirza Omer .

COMPUTER SPEECH AND LANGUAGE, 2022, 74

[5]

[Anonymous], 2020, Semeval-2020 task 12: Multilingual offensive language identification in social media (offenseval 2020)

[6] Classifying offensive sites based on image content [J].

Arentz, WA ;

Olstad, B .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2004, 94 (1-3) :295-310

[7]

Aroyehun ST, 2018, COLING 2018, P90

[8]

Arroyo-Fernandez I., 2018, COLING 2018 1 WORK T, P140

[9]

Bajaj A, 2023, A State-Of-The-Art Review on Adversarial Machine Learning in Image Classification

[10]

Baruah Arup, 2019, SEMEVAL NAACL HLT, P371

← 1 2 3 4 5 6 7 →