Multilingual Detection of Cyberbullying in Mixed Urdu, Roman Urdu, and English Social Media Conversations

被引：4

作者：

Razi, Fakhra ^{[1
]}

Ejaz, Naveed ^{[1
]}

机构：

[1] Iqra Univ, Dept Comp & Technol, Islamabad Campus, Islamabad 44790, Pakistan

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Cyberbullying; Linguistics; Oral communication; Annotations; Blogs; Protocols; Task analysis; Social networking (online); Hate speech; automatic cyberbullying detection; social media analysis; abusive language detection; multilingual cyberbullying detection;

D O I：

10.1109/ACCESS.2024.3432908

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic cyberbullying detection in social media is increasingly vital due to the integral role of social networks in people's lives and the severe impact of cyberbullying. Cyberbullying involves intentional, repetitive, aggressive behaviour to harm others online. Among Urdu-speaking communities worldwide, it is common to use Urdu, Roman Urdu, and English in social media conversations. Existing research and detection methods overlook these linguistic dynamics and fail to address cyberbullying across these languages comprehensively. Additionally, there is no dataset in Urdu and Roman Urdu covering the repetition and intent to harm components of cyberbullying. This research addresses this gap by developing and annotating a comprehensive dataset capturing linguistic variations in cyberbullying instances across Urdu, Roman Urdu, and English, incorporating all aspects of cyberbullying. Besides proposing a dataset, a framework for detecting cyberbullying has been proposed. The framework classifies text messages as aggressive or non-aggressive and introduces novel quantitative measures for repetition and the level of intent to cause harm. The proposed framework classifies cyberbullying by applying thresholds to measures of aggression, repetition, and intent to harm, integrating all three aspects. Results show aggression detection using fine-tuned m-BERT and MuRIL, incorporating measures of repetition and intent to harm on the proposed dataset. Additionally, experiments are conducted to demonstrate the impact of repetition and intent to harm on cyberbullying classification. The best results on the dataset are achieved using fine-tuned MuRIL with a precision of 0.93, recall of 0.92, and an F-measure of 0.92 by incorporating quantitative measures of repetition and intent to harm.

引用

页码：105201 / 105210

页数：10

共 29 条

[1] Automatic Detection of Offensive Language for Urdu and Roman Urdu [J].

Akhter, Muhammad Pervez ;

Zheng Jiangbin ;

Naqvi, Irfan Raza ;

Abdelmajeed, Mohammed ;

Sadiq, Muhammad Tariq .

IEEE ACCESS, 2020, 8 :91213-91226

[2] UrduThreat@ FIRE2021: Shared Track on Abusive Threat Identification in Urdu [J].

Amjad, Maaz ;

Zhila, Alisa ;

Sidorov, Grigori ;

Labunets, Andrey ;

Butt, Sabur ;

Amjad, Hamza Imam ;

Vitman, Oxana ;

Gelbukh, Alexander .

FIRE 2021: PROCEEDINGS OF THE 13TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION, 2021, :9-11

[3]

Anwar G. B., 2022, P INT C IT IND TECHN, P1

[4] Sentiment analysis of extremism in social media from textual information [J].

Asif, Muhammad ;

Ishtiaq, Atiab ;

Ahmad, Haseeb ;

Aljuaid, Hanan ;

Shah, Jalal .

TELEMATICS AND INFORMATICS, 2020, 48

[5]

Das Mithun, 2022, HT '22: Proceedings of the 33rd ACM Conference on Hypertext and Social Media, P32, DOI 10.1145/3511095.3531277

[6]

Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]

[7] Detection of Cyberbullying Patterns in Low Resource Colloquial Roman Urdu Microtext using Natural Language Processing, Machine Learning, and Ensemble Techniques [J].

Dewani, Amirita ;

Memon, Mohsin Ali ;

Bhatti, Sania ;

Sulaiman, Adel ;

Hamdi, Mohammed ;

Alshahrani, Hani ;

Alghamdi, Abdullah ;

Shaikh, Asadullah .

APPLIED SCIENCES-BASEL, 2023, 13 (04)

[8] Cyberbullying detection: advanced preprocessing techniques & deep learning architecture for Roman Urdu data [J].

Dewani, Amirita ;

Memon, Mohsin Ali ;

Bhatti, Sania .

JOURNAL OF BIG DATA, 2021, 8 (01)

[9] DEVELOPMENT OF COMPUTATIONAL LINGUISTIC RESOURCES FOR AUTOMATED DETECTION OF TEXTUAL CYBERBULLYING THREATS IN ROMAN URDU LANGUAGE [J].

Dewani, Amirita ;

Memon, Mohsin Ali ;

Bhatti, Sania .

3C TIC, 2021, 10 (02) :101-121

[10] Towards comprehensive cyberbullying detection: A dataset incorporating aggressive texts, repetition, peerness, and intent to harm [J].

Ejaz, Naveed ;

Razi, Fakhra ;

Choudhury, Salimur .

COMPUTERS IN HUMAN BEHAVIOR, 2024, 153

← 1 2 3 →