Towards comprehensive cyberbullying detection: A dataset incorporating aggressive texts, repetition, peerness, and intent to harm

被引:12
作者
Ejaz, Naveed [1 ]
Razi, Fakhra [2 ]
Choudhury, Salimur [1 ]
机构
[1] Queens Univ, Sch Comp, Kingston, ON, Canada
[2] Iqra Univ, Dept Comp & Technol, Islamabad Campus, Islamabad, Pakistan
关键词
Cyberbullying; Automatic cyberbullying detection; Cyberaggression; Abusive language; Social media networks; Natural language processing;
D O I
10.1016/j.chb.2023.108123
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
The increasing usage of social media networks has raised concerns about the growing frequency of cyber-bullying incidents. The definition of cyberbullying lacks universal consensus, yet according to several authors, cyberbullying is characterized by aggressive, repetitive, and intentional communication among peers. However, existing cyberbullying detection datasets often focus solely on classifying texts as aggressive or non-aggressive, neglecting the other cyberbullying aspects, thus hindering research progress. This paper proposes a framework for designing a new dataset incorporating all four aspects of cyberbullying to address this gap. The text messages are sourced from a real dataset, while the users' data is generated synthetically. The resulting dataset contains messages exchanged randomly among different pairs of users, thus inculcating repetition. Additionally, the degree of peerness, defined and calculated to measure the likelihood of two users being peers, is used. The intent of harm is quantified as a numeric value using the ratios of aggression and repetition. As a result, the proposed dataset encompasses all four aspects of cyberbullying by providing repeated aggressive messages among users along with quantitative values of the degree of peerness and intent to harm. The proposed dataset is adaptable, with adjustable threshold values for peerness, repetition, and intent to harm, offering flexibility for various applications. The paper concludes by presenting the results of some baseline machine-learning methods on the proposed dataset.
引用
收藏
页数:11
相关论文
共 68 条
[1]   Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms [J].
Agrawal, Sweta ;
Awekar, Amit .
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 :141-153
[2]  
Al-Hashedi M., 2023, Cyberbullying detection based on emotion
[3]  
[Anonymous], 2023, Q&A forum?
[4]   Improving cyberbullying detection using Twitter users' psychological features and machine learning [J].
Balakrishnan, Vimala ;
Khan, Shahzaib ;
Arabnia, Hamid R. .
COMPUTERS & SECURITY, 2020, 90
[5]  
Bayzick J., 2011, DETECTING PRESENCE C
[6]   Inconsistent Definitions of Bullying: A Need to Examine People's Judgments and Reasoning about Bullying and Cyberbullying [J].
Chang, Viviane .
HUMAN DEVELOPMENT, 2021, 65 (03) :144-159
[7]   Mean Birds: Detecting Aggression and Bullying on Twitter [J].
Chatzakou, Despoina ;
Kourtellis, Nicolas ;
Blackburn, Jeremy ;
De Cristofaro, Emiliano ;
Stringhini, Gianluca ;
Vakali, Athena .
PROCEEDINGS OF THE 2017 ACM WEB SCIENCE CONFERENCE (WEBSCI '17), 2017, :13-22
[8]   XBully: Cyberbullying Detection within a Multi-Modal Context [J].
Cheng, Lu ;
Li, Jundong ;
Silva, Yasin N. ;
Hall, Deborah L. ;
Liu, Huan .
PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, :339-347
[9]  
cjadams Jefrey Sorensen, 2017, Toxic comment classification challenge
[10]  
Dadvar M, 2014, LECT NOTES COMPUT SC, V8436, P275, DOI 10.1007/978-3-319-06483-3_25