Abusive Comment Detection from Bengali-English Code-Mixed Social Media Texts Using Ensemble of Deep Learning

被引：0

作者：

Fahim, Iftekhar ^{[1
]}

Ahsan, Shawly ^{[1
]}

Hoque, Mohammed Moshiul ^{[1
]}

机构：

[1] Chittagong Univ Engn & Technol, Chattogram 4349, Bangladesh

来源：

ARTIFICIAL INTELLIGENCE AND KNOWLEDGE PROCESSING, AIKP 2024 | 2025年 / 2228卷

关键词：

Natural language processing; Code-mixing; Deep learning; Text processing; Abusive content detection; AGREEMENT;

D O I：

10.1007/978-3-031-73477-9_18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Code-mixing, which involves seamlessly combining multiple languages within a single text, has become increasingly common on social media platforms. The pervasiveness of aggressive content and offensive language on social media presents significant challenges, necessitating the development of automatic detection methods. This problem becomes more complex when dealing with code-mixed text owing to the cultural nuances of different languages. Although efforts to identify abusive content in code-mixed text have primarily concentrated on high-resource languages, research on resource-constrained languages, such as Bengali mixed with English, still needs to be completed. Some studies have aimed at detecting abusive content in transliterated Bengali texts. However, there is a notable absence of research addressing the detection of abusive content in Bengali-English code-mixed texts. To address this gap, this paper presents a custom-built Bengali-English code-mixed dataset containing 2700 annotated comments categorized as abusive and non-abusive. To facilitate research in this area, this work proposes an ensemble of deep learning (DL) models: CNN (using GloVe embeddings), LSTM (implemented with Keras), and BiLSTM (utilizing FastText embeddings). The ensemble approach attained the most elevated weighted f1-score of 0.81. This research aims to tackle the growing issue of abusive content in code-mixed data, creating safer and more inclusive online environments.

引用

页码：252 / 267

页数：16

共 50 条

[1] Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora
Jamatia, Anupam
Das, Amitava
Gambaeck, Bjoern
JOURNAL OF INTELLIGENT SYSTEMS, 2019, 28 (03) : 399 - 408
[2] Social media text analytics of Malayalam–English code-mixed using deep learning
S. Thara
Prabaharan Poornachandran
Journal of Big Data, 9
[3] Social media text analytics of Malayalam-English code-mixed using deep learning
Thara, S.
Poornachandran, Prabaharan
JOURNAL OF BIG DATA, 2022, 9 (01)
[4] Language Identification of Bengali-English Code-Mixed Data using Character & Phonetic based LSTM Models
Das, Sourya Dipta
Mandal, Soumil
Das, Dipankar
PROCEEDINGS OF THE 11TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2019), 2019, : 60 - 64
[5] Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
Santosh, T. Y. S. S.
Aravind, K. V. S.
PROCEEDINGS OF THE 6TH ACM IKDD CODS AND 24TH COMAD, 2019, : 310 - 313
[6] Sentiment Analysis of Code-Mixed Roman Urdu-English Social Media Text using Deep Learning Approaches
Younas, Aqsa
Nasim, Raheela
Ali, Saqib
Wang, Guojun
Qi, Fang
2020 IEEE 23RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2020), 2020, : 66 - 71
[7] Meta-Learning for Offensive Language Detection in Code-Mixed Texts
Suresh, Gautham Vadakkekara
Chakravarthi, Bharathi Raja
McCrae, John P.
FIRE 2021: PROCEEDINGS OF THE 13TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION, 2021, : 58 - 66
[8] Word Level Language Identification in Assamese-Bengali-Hindi-English Code-Mixed Social Media Text
Sarma, Neelakshi
Singh, Sanasam Ranbir
Goswami, Diganta
2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 261 - 266
[9] Sentiment Analysis of Code-Mixed Bambara-French Social Media Text Using Deep Learning Techniques
Arouna KONATE
DU Ruiying
Wuhan University Journal of Natural Sciences, 2018, 23 (03) : 237 - 243
[10] Abusive language detection from social media comments using conventional machine learning and deep learning approaches
Muhammad Pervez Akhter
Zheng Jiangbin
Irfan Raza Naqvi
Mohammed AbdelMajeed
Tehseen Zia
Multimedia Systems, 2022, 28 : 1925 - 1940

← 1 2 3 4 5 →