Enhancing Arabic Cyberbullying Detection with End-to-End Transformer Model

被引:1
作者
Mahdi, Mohamed A. [1 ]
Fati, Suliman Mohamed [2 ]
Hazber, Mohamed A. G. [1 ]
Ahamad, Shahanawaj [3 ]
Saad, Sawsan A. [4 ]
机构
[1] Univ Hail, Coll Comp Sci & Engn, Informat & Comp Sci Engn Dept, Hail 55476, Saudi Arabia
[2] Prince Sultan Univ, Coll Comp & Informat Sci, Informat Syst Dept, Riyadh 11586, Saudi Arabia
[3] Univ Hail, Coll Comp Sci & Engn, Software Engn Dept, Hail 55476, Saudi Arabia
[4] Univ Hail, Coll Comp Sci & Engn, Comp Engn Dept, Hail 55476, Saudi Arabia
来源
CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES | 2024年 / 141卷 / 02期
关键词
Cyberbullying; offensive detection; Bidirectional Encoder Representations from the Transformers (BERT); continuous bag of words; Social Media; natural language processing;
D O I
10.32604/cmes.2024.052291
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Cyberbullying, a critical concern for digital safety, necessitates effective linguistic analysis tools that can navigate the complexities of language use in online spaces. To tackle this challenge, our study introduces a new approach employing Bidirectional Encoder Representations from the Transformers (BERT) base model (cased), originally pretrained in English. This model is uniquely adapted to recognize the intricate nuances of Arabic online communication, a key aspect often overlooked in conventional cyberbullying detection methods. Our model is an end-to-end solution that has been fine-tuned on a diverse dataset of Arabic social media (SM) tweets showing a notable increase in detection accuracy and sensitivity compared to existing methods. Experimental results on a diverse Arabic dataset collected from the 'X platform' demonstrate a notable increase in detection accuracy and sensitivity compared to existing methods. E-BERT shows a substantial improvement in performance, evidenced by an accuracy of 98.45%, precision of 99.17%, recall of 99.10%, and an F1 score of 99.14%. The proposed E-BERT not only addresses a critical gap in cyberbullying detection in Arabic online forums but also sets a precedent for applying cross-lingual pretrained models in regional language applications, offering a scalable and effective framework for enhancing online safety across Arabic-speaking communities.
引用
收藏
页码:1651 / 1671
页数:21
相关论文
共 40 条
[1]  
Abdul-Mageed M., 2020, ARXIV
[2]  
Abozinadah E., 2016, International Journal of Data Mining Knowledge Management Process (IJDKP), V6, P17
[3]  
Abozinadah E.A., 2017, P INT C COMP DAT AN, P6, DOI DOI 10.1145/3093241.3093281
[4]  
Abozinadah E.A., 2015, Int J Knowl Eng, V1, P113, DOI [DOI 10.7763/IJKE.2015.V1.19, 10.7763/IJKE.2015.V1.19]
[5]  
Ahmed MT, 2023, INT J ADV COMPUT SC, V14, P545
[6]   Detection of Hateful Social Media Content for Arabic Language [J].
Al-Ibrahim, Rogayah M. ;
Ali, Mostafa Z. ;
Najadat, Hassan M. .
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)
[7]   Exploring the Role of Emotions in Arabic Rumor Detection in Social Media [J].
Al-Saif, Hissa F. ;
Al-Dossari, Hmood Z. .
APPLIED SCIENCES-BASEL, 2023, 13 (15)
[8]   Dataset Construction for the Detection of Anti-Social Behaviour in Online Communication in Arabic [J].
Alakrot, Azalden ;
Murray, Liam ;
Nikolov, Nikola S. .
ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 :174-181
[9]   Detecting Arabic Cyberbullying Tweets Using Machine Learning [J].
Alduailaj, Alanoud Mohammed ;
Belghith, Aymen .
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2023, 5 (01) :29-42
[10]  
Almutiry S., 2021, The Egyptian Journal of Language Engineering, V8, P39