Abusive Comment Detection from Bengali-English Code-Mixed Social Media Texts Using Ensemble of Deep Learning

被引:0
|
作者
Fahim, Iftekhar [1 ]
Ahsan, Shawly [1 ]
Hoque, Mohammed Moshiul [1 ]
机构
[1] Chittagong Univ Engn & Technol, Chattogram 4349, Bangladesh
来源
ARTIFICIAL INTELLIGENCE AND KNOWLEDGE PROCESSING, AIKP 2024 | 2025年 / 2228卷
关键词
Natural language processing; Code-mixing; Deep learning; Text processing; Abusive content detection; AGREEMENT;
D O I
10.1007/978-3-031-73477-9_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code-mixing, which involves seamlessly combining multiple languages within a single text, has become increasingly common on social media platforms. The pervasiveness of aggressive content and offensive language on social media presents significant challenges, necessitating the development of automatic detection methods. This problem becomes more complex when dealing with code-mixed text owing to the cultural nuances of different languages. Although efforts to identify abusive content in code-mixed text have primarily concentrated on high-resource languages, research on resource-constrained languages, such as Bengali mixed with English, still needs to be completed. Some studies have aimed at detecting abusive content in transliterated Bengali texts. However, there is a notable absence of research addressing the detection of abusive content in Bengali-English code-mixed texts. To address this gap, this paper presents a custom-built Bengali-English code-mixed dataset containing 2700 annotated comments categorized as abusive and non-abusive. To facilitate research in this area, this work proposes an ensemble of deep learning (DL) models: CNN (using GloVe embeddings), LSTM (implemented with Keras), and BiLSTM (utilizing FastText embeddings). The ensemble approach attained the most elevated weighted f1-score of 0.81. This research aims to tackle the growing issue of abusive content in code-mixed data, creating safer and more inclusive online environments.
引用
收藏
页码:252 / 267
页数:16
相关论文
empty
未找到相关数据