Hate-Speech and Offensive Language Detection in Roman Urdu

被引:0
作者
Rizwan, Hammad [1 ]
Shakeel, Muhammad Haroon [1 ]
Karim, Asim [1 ]
机构
[1] Lahore Univ Management Sci LUMS, Dept Comp Sci, Lahore, Pakistan
来源
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of automatic hate-speech and offensive language detection in social media content is of utmost importance due to its implications in unprejudiced society concerning race, gender, or religion. Existing research in this area, however, is mainly focused on the English language, limiting the applicability to particular demographics. Despite its prevalence, Roman Urdu (RU) lacks language resources, annotated datasets, and language models for this task. In this study, we: (1) Present a lexicon of hateful words in RU, (2) Develop an annotated dataset called RUHSOLD consisting of 10; 012 tweets in RU with both coarse-grained and fine-grained labels of hate-speech and offensive language, (3) Explore the feasibility of transfer learning of five existing embedding models to RU, (4) Propose a novel deep learning architecture called CNN-gram for hatespeech and offensive language detection and compare its performance with seven current baseline approaches on RUHSOLD dataset, and (5) Train domain-specific embeddings on more than 4:7 million tweets and make them publicly available. We conclude that transfer learning is more beneficial as compared to training embedding from scratch and that the proposed model exhibits greater robustness as compared to the baselines.
引用
收藏
页码:2512 / 2522
页数:11
相关论文
共 50 条
[41]   A comprehensive review on Arabic offensive language and hate speech detection on social media: methods, challenges and solutions [J].
Abdelsamie, Mahmoud Mohamed ;
Azab, Shahira Shaaban ;
Hefny, Hesham A. .
SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
[42]   Automatic Hate and Offensive speech detection framework from social media : the case of Afaan Oromoo language [J].
Kanessa, Lata Guta ;
Tulu, Solomon Gizaw .
2021 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR DEVELOPMENT FOR AFRICA (ICT4DA), 2021, :42-47
[43]   STRIKING A BALANCE + HATE-SPEECH, FREEDOM OF EXPRESSION AND NONDISCRIMINATION [J].
DARBISHIRE, H .
INDEX ON CENSORSHIP, 1992, 21 (07) :13-14
[44]   Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis [J].
Ali, Muhammad Z. ;
Ehsan-Ul-Haq ;
Rauf, Sahar ;
Javed, Kashif ;
Hussain, Sarmad .
IEEE ACCESS, 2021, 9 :84296-84305
[45]   Ayaan Hirsi Ali and the Campus Hate-Speech Canard [J].
Zelinsky, Nathaniel .
COMMENTARY, 2014, 138 (04) :35-39
[46]   Hate-Speech Bans are at Odds with Central Principles of Liberalism [J].
Kramer, Matthew H. .
LAW AND PHILOSOPHY, 2025, 44 (01) :13-59
[47]   A SHIFTING BALANCE - FREEDOM OF EXPRESSION AND HATE-SPEECH RESTRICTION [J].
STEFANCIC, J ;
DELGADO, R .
IOWA LAW REVIEW, 1993, 78 (03) :737-750
[48]   Tapes of wrath (Hate-speech and America's militias) [J].
Vulliamy, E .
INDEX ON CENSORSHIP, 1998, 27 (01) :61-71
[50]   A comparison of text preprocessing techniques for hate and offensive speech detection in Twitter [J].
Anna Glazkova .
Social Network Analysis and Mining, 13