Abusive Comment Detection from Bengali-English Code-Mixed Social Media Texts Using Ensemble of Deep Learning

被引:0
|
作者
Fahim, Iftekhar [1 ]
Ahsan, Shawly [1 ]
Hoque, Mohammed Moshiul [1 ]
机构
[1] Chittagong Univ Engn & Technol, Chattogram 4349, Bangladesh
来源
ARTIFICIAL INTELLIGENCE AND KNOWLEDGE PROCESSING, AIKP 2024 | 2025年 / 2228卷
关键词
Natural language processing; Code-mixing; Deep learning; Text processing; Abusive content detection; AGREEMENT;
D O I
10.1007/978-3-031-73477-9_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code-mixing, which involves seamlessly combining multiple languages within a single text, has become increasingly common on social media platforms. The pervasiveness of aggressive content and offensive language on social media presents significant challenges, necessitating the development of automatic detection methods. This problem becomes more complex when dealing with code-mixed text owing to the cultural nuances of different languages. Although efforts to identify abusive content in code-mixed text have primarily concentrated on high-resource languages, research on resource-constrained languages, such as Bengali mixed with English, still needs to be completed. Some studies have aimed at detecting abusive content in transliterated Bengali texts. However, there is a notable absence of research addressing the detection of abusive content in Bengali-English code-mixed texts. To address this gap, this paper presents a custom-built Bengali-English code-mixed dataset containing 2700 annotated comments categorized as abusive and non-abusive. To facilitate research in this area, this work proposes an ensemble of deep learning (DL) models: CNN (using GloVe embeddings), LSTM (implemented with Keras), and BiLSTM (utilizing FastText embeddings). The ensemble approach attained the most elevated weighted f1-score of 0.81. This research aims to tackle the growing issue of abusive content in code-mixed data, creating safer and more inclusive online environments.
引用
收藏
页码:252 / 267
页数:16
相关论文
共 50 条
  • [1] Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora
    Jamatia, Anupam
    Das, Amitava
    Gambaeck, Bjoern
    JOURNAL OF INTELLIGENT SYSTEMS, 2019, 28 (03) : 399 - 408
  • [2] Social media text analytics of Malayalam–English code-mixed using deep learning
    S. Thara
    Prabaharan Poornachandran
    Journal of Big Data, 9
  • [3] Social media text analytics of Malayalam-English code-mixed using deep learning
    Thara, S.
    Poornachandran, Prabaharan
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [4] Language Identification of Bengali-English Code-Mixed Data using Character & Phonetic based LSTM Models
    Das, Sourya Dipta
    Mandal, Soumil
    Das, Dipankar
    PROCEEDINGS OF THE 11TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2019), 2019, : 60 - 64
  • [5] Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
    Santosh, T. Y. S. S.
    Aravind, K. V. S.
    PROCEEDINGS OF THE 6TH ACM IKDD CODS AND 24TH COMAD, 2019, : 310 - 313
  • [6] Sentiment Analysis of Code-Mixed Roman Urdu-English Social Media Text using Deep Learning Approaches
    Younas, Aqsa
    Nasim, Raheela
    Ali, Saqib
    Wang, Guojun
    Qi, Fang
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2020), 2020, : 66 - 71
  • [7] Meta-Learning for Offensive Language Detection in Code-Mixed Texts
    Suresh, Gautham Vadakkekara
    Chakravarthi, Bharathi Raja
    McCrae, John P.
    FIRE 2021: PROCEEDINGS OF THE 13TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION, 2021, : 58 - 66
  • [8] Word Level Language Identification in Assamese-Bengali-Hindi-English Code-Mixed Social Media Text
    Sarma, Neelakshi
    Singh, Sanasam Ranbir
    Goswami, Diganta
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 261 - 266
  • [9] Sentiment Analysis of Code-Mixed Bambara-French Social Media Text Using Deep Learning Techniques
    Arouna KONATE
    DU Ruiying
    Wuhan University Journal of Natural Sciences, 2018, 23 (03) : 237 - 243
  • [10] Abusive language detection from social media comments using conventional machine learning and deep learning approaches
    Muhammad Pervez Akhter
    Zheng Jiangbin
    Irfan Raza Naqvi
    Mohammed AbdelMajeed
    Tehseen Zia
    Multimedia Systems, 2022, 28 : 1925 - 1940