Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach

被引:0
作者
Saifullah K. [1 ]
Khan M.I. [1 ]
Jamal S. [2 ]
Sarker I.H. [3 ]
机构
[1] Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong
[2] Dept. of Information Technology, Georgia Southern University, Statesboro, GA
[3] Centre for Securing Digital Futures, School of Science, Edith Cowan University, Perth, 6027, WA
关键词
Cyberbullying; deep learning; fine tuning; harmful messages; large language modeling; natural language processing (NLP); OOV; transformers models;
D O I
10.4108/EETINIS.V11I1.4703
中图分类号
学科分类号
摘要
In the contemporary digital age, social media platforms like Facebook, Twitter, and YouTube serve as vital channels for individuals to express ideas and connect with others. Despite fostering increased connectivity, these platforms have inadvertently given rise to negative behaviors, particularly cyberbullying. While extensive research has been conducted on high-resource languages such as English, there is a notable scarcity of resources for low-resource languages like Bengali, Arabic, Tamil, etc., particularly in terms of language modeling. This study addresses this gap by developing a cyberbullying text identification system called BullyFilterNeT tailored for social media texts, considering Bengali as a test case. The intelligent BullyFilterNeT system devised overcomes Out-of-Vocabulary (OOV) challenges associated with non-contextual embeddings and addresses the limitations of context-aware feature representations. To facilitate a comprehensive understanding, three non-contextual embedding models GloVe, FastText, and Word2Vec are developed for feature extraction in Bengali. These embedding models are utilized in the classification models, employing three statistical models (SVM, SGD, Libsvm), and four deep learning models (CNN, VDCNN, LSTM, GRU). Additionally, the study employs six transformer-based language models: mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT, respectively to overcome the limitations of earlier models. Remarkably, BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, underscoring its effectiveness in cyberbullying text identification in the Bengali language. Copyright © 2024 K. Saifullah et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.
引用
收藏
页码:1 / 12
页数:11
相关论文
共 30 条
[21]  
Mihaylov Todor, Georgiev Georgi, Nakov Preslav, Finding opinion manipulation trolls in news community forums, Proceedings of the nineteenth conference on computational natural language learning, pp. 310-314, (2015)
[22]  
Mikolov T., Chen K., Corrado G., Dean J., Efficient estimation of word representations in vector space, pp. 1-12, (2013)
[23]  
Nikhil Nishant, Pahwa Ramit, Nirala Mehul Kumar, Khilnani Rohan, Lstms with attention for aggression detection, (2018)
[24]  
Pamungkas Endang Wahyu, Patti Viviana, Cross-domain and cross-lingual abusive language detection: A hybrid approach with deep learning and a multilingual lexicon, Proceedings of the 57th annual meeting of the association for computational linguistics: Student research workshop, pp. 363-370, (2019)
[25]  
Pavlopoulos John, Thain Nithum, Dixon Lucas, Androutsopoulos Ion, Convai at semeval-2019 task 6: Offensive language identification and categorization with perspective and bert, Proceedings of the 13th international Workshop on Semantic Evaluation, pp. 571-576, (2019)
[26]  
Pennington J., Socher R., Manning C., Glove: Global vectors for word representation, Proc. EMNLP, pp. 1532-1543, (2014)
[27]  
Rice Eric, Petering Robin, Rhoades Harmony, Winetrobe Hailey, Goldbach Jeremy, Plant Aaron, Montoya Jorge, Kordic Timothy, Cyberbullying perpetration and victimization among middle-school students, American journal of public health, 105, 3, pp. e66-e72, (2015)
[28]  
Risch Julian, Krestel Ralf, Bagging bert models for robust aggression identification, Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 55-61, (2020)
[29]  
Ross Bjorn, Rist Michael, Carbonell Guillermo, Cabrera Benjamin, Kurowsky Nils, Wojatzki Michael, Measuring the reliability of hate speech annotations: The case of the european refugee crisis, (2017)
[30]  
Zampieri Marcos, Malmasi Shervin, Nakov Preslav, Rosenthal Sara, Farra Noura, Kumar Ritesh, Predicting the type and target of offensive posts in social media, (2019)