Multimodal hate speech detection via multi-scale visual kernels and knowledge distillation architecture

被引:18
作者
Chhabra, Anusha [1 ]
Vishwakarma, Dinesh Kumar [1 ]
机构
[1] Delhi Technol Univ, Dept Informat Technol, Biometr Res Lab, Delhi, India
关键词
Hate content; Deep learning; Machine learning; Multimodal; Adaptive receptive field; SENTIMENT ANALYSIS;
D O I
10.1016/j.engappai.2023.106991
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
People increasingly use social media platforms to express themselves by posting visuals and texts. As a result, hate content is on the rise, necessitating practical visual caption analysis. Thus, the relationship between image and caption modalities is crucial in visual caption analysis. Contrarily, most methods combine features from the image and caption modalities using deep learning architectures with millions of parameters already trained without integrating a specialized attention module, resulting in less desirable outcomes. This paper suggests a novel multi-modal architecture for identifying hateful memetic information in response to the above observation. The proposed architecture contains a novel "multi-scale kernel attentive visual" (MSKAV) module that uses an efficient multi-branch structure to extract discriminative visual features. Additionally, MSKAV utilizes an adaptive receptive field using multi-scale kernels. MSKAV also incorporates a multi-directional visual attention module to highlight spatial regions of importance. The proposed model also contains a novel "knowledge distillation-based attentional caption" (KDAC) module. It uses a transformer-based self-attentive block to extract discriminative features from meme captions. Thorough experimentation on multi-modal hate speech benchmarks MultiOff, Hateful Memes, and MMHS150K datasets achieved accuracy scores of 0.6250, 0.8750, and 0.8078, respectively. It also reaches impressive AUC scores of 0.6557, 0.8363, and 0.7665 on the three datasets, respectively, beating SOTA multi-modal hate speech identification models.
引用
收藏
页数:15
相关论文
共 66 条
[41]   Multimodal Sentiment Analysis: Addressing Key Issues and Setting Up the Baselines [J].
Poria, Soujanya ;
Majumder, Navonil ;
Hazarika, Devamanyu ;
Cambria, Erik ;
Gelbukh, Alexander ;
Hussain, Amir .
IEEE INTELLIGENT SYSTEMS, 2018, 33 (06) :17-25
[42]  
Poria S, 2016, IEEE DATA MINING, P439, DOI [10.1109/ICDM.2016.178, 10.1109/ICDM.2016.0055]
[43]   Fusing audio, visual and textual clues for sentiment analysis from multimodal content [J].
Poria, Soujanya ;
Cambria, Erik ;
Howard, Newton ;
Huang, Guang-Bin ;
Hussain, Amir .
NEUROCOMPUTING, 2016, 174 :50-59
[44]   Hate Speech Detection Using Static BERT Embeddings [J].
Rajput, Gaurav ;
Punn, Narinder Singh ;
Sonbhadra, Sanjay Kumar ;
Agarwal, Sonali .
9TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS, BDA 2021, 2021, 13147 :67-77
[45]  
Rani P., 2020, P 2 WORKSH TROLL AGG, P42
[46]  
Ranjan P, 2016, PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), P608, DOI 10.1109/IC3I.2016.7918035
[47]   A novel framework for semantic classification of cyber terrorist communities on Twitter [J].
Saidi, Firas ;
Trabelsi, Zouheir ;
Thangaraj, Eswari .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 115
[48]  
Sanh V., 2019, neurIPS, P2
[49]  
Schmidt Anna, 2017, P 5 INT WORKSH NAT L, P1, DOI [DOI 10.18653/V1/W17-1101, 10.18653/v1/w17-1101]
[50]   Ceasing hate with MoH: Hate Speech Detection in Hindi-English code-switched language [J].
Sharma, Arushi ;
Kabra, Anubha ;
Jain, Minni .
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)