Multimodal hate speech detection via multi-scale visual kernels and knowledge distillation architecture

被引：18

作者：

Chhabra, Anusha ^{[1
]}

Vishwakarma, Dinesh Kumar ^{[1
]}

机构：

[1] Delhi Technol Univ, Dept Informat Technol, Biometr Res Lab, Delhi, India

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2023年 / 126卷

关键词：

Hate content; Deep learning; Machine learning; Multimodal; Adaptive receptive field; SENTIMENT ANALYSIS;

D O I：

10.1016/j.engappai.2023.106991

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

People increasingly use social media platforms to express themselves by posting visuals and texts. As a result, hate content is on the rise, necessitating practical visual caption analysis. Thus, the relationship between image and caption modalities is crucial in visual caption analysis. Contrarily, most methods combine features from the image and caption modalities using deep learning architectures with millions of parameters already trained without integrating a specialized attention module, resulting in less desirable outcomes. This paper suggests a novel multi-modal architecture for identifying hateful memetic information in response to the above observation. The proposed architecture contains a novel "multi-scale kernel attentive visual" (MSKAV) module that uses an efficient multi-branch structure to extract discriminative visual features. Additionally, MSKAV utilizes an adaptive receptive field using multi-scale kernels. MSKAV also incorporates a multi-directional visual attention module to highlight spatial regions of importance. The proposed model also contains a novel "knowledge distillation-based attentional caption" (KDAC) module. It uses a transformer-based self-attentive block to extract discriminative features from meme captions. Thorough experimentation on multi-modal hate speech benchmarks MultiOff, Hateful Memes, and MMHS150K datasets achieved accuracy scores of 0.6250, 0.8750, and 0.8078, respectively. It also reaches impressive AUC scores of 0.6557, 0.8363, and 0.7665 on the three datasets, respectively, beating SOTA multi-modal hate speech identification models.

引用

页数：15

共 66 条

[61] MRT-Net: Auto-adaptive weighting of manipulation residuals and texture clues for face manipulation detection [J].

Yadav, Ankit ;

Vishwakarma, Dinesh Kumar .

EXPERT SYSTEMS WITH APPLICATIONS, 2023, 232

[62]

Yang ZL, 2019, ADV NEUR IN, V32

[63] Research on Outdoor Mobile Music Speaker Battery Management Algorithm Based on Dynamic Redundancy [J].

Yu, Xiaofei ;

Li, Yanke ;

Li, Xiaonan ;

Wang, Licheng ;

Wang, Kai .

TECHNOLOGIES, 2023, 11 (02)

[64] Developments and Applications of Artificial Intelligence in Music Education [J].

Yu, Xiaofei ;

Ma, Ning ;

Zheng, Lei ;

Wang, Licheng ;

Wang, Kai .

TECHNOLOGIES, 2023, 11 (02)

[65] A Review of SOH Prediction of Li-Ion Batteries Based on Data-Driven Algorithms [J].

Zhang, Ming ;

Yang, Dongfang ;

Du, Jiaxuan ;

Sun, Hanlei ;

Li, Liwei ;

Wang, Licheng ;

Wang, Kai .

ENERGIES, 2023, 16 (07)

[66] Electrochemical Impedance Spectroscopy: A New Chapter in the Fast and Accurate Estimation of the State of Health for Lithium-Ion Batteries [J].

Zhang, Ming ;

Liu, Yanshuo ;

Li, Dezhi ;

Cui, Xiaoli ;

Wang, Licheng ;

Li, Liwei ;

Wang, Kai .

ENERGIES, 2023, 16 (04)

← 1 2 3 4 5 6 7 →