Hate-Speech and Offensive Language Detection in Roman Urdu

被引：0

作者：

Rizwan, Hammad ^{[1
]}

Shakeel, Muhammad Haroon ^{[1
]}

Karim, Asim ^{[1
]}

机构：

[1] Lahore Univ Management Sci LUMS, Dept Comp Sci, Lahore, Pakistan

来源：

PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The task of automatic hate-speech and offensive language detection in social media content is of utmost importance due to its implications in unprejudiced society concerning race, gender, or religion. Existing research in this area, however, is mainly focused on the English language, limiting the applicability to particular demographics. Despite its prevalence, Roman Urdu (RU) lacks language resources, annotated datasets, and language models for this task. In this study, we: (1) Present a lexicon of hateful words in RU, (2) Develop an annotated dataset called RUHSOLD consisting of 10; 012 tweets in RU with both coarse-grained and fine-grained labels of hate-speech and offensive language, (3) Explore the feasibility of transfer learning of five existing embedding models to RU, (4) Propose a novel deep learning architecture called CNN-gram for hatespeech and offensive language detection and compare its performance with seven current baseline approaches on RUHSOLD dataset, and (5) Train domain-specific embeddings on more than 4:7 million tweets and make them publicly available. We conclude that transfer learning is more beneficial as compared to training embedding from scratch and that the proposed model exhibits greater robustness as compared to the baselines.

引用

页码：2512 / 2522

页数：11

共 50 条

[11] UHated: hate speech detection in Urdu language using transfer learning
Arshad, Muhammad Umair
Ali, Raza
Beg, Mirza Omer
Shahzad, Waseem
LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (02) : 713 - 732
[12] Detection of Hate and Offensive Speech in Text
Wani, Abid Hussain
Molvi, Nahida Shafi
Ashraf, Sheikh Ishrah
INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2019), 2020, 11886 : 87 - 93
[13] On the Impact ofWord Representation in Hate Speech and Offensive Language Detection and Explanation
Hu, Ruijia
Dorris, Wyatt
Vishwamitra, Nishant
Luo, Feng
Costello, Matthew
PROCEEDINGS OF THE TENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, CODASPY 2020, 2020, : 171 - 173
[14] Investigating the Effect of Preprocessing Arabic Text on Offensive Language and Hate Speech Detection
Husain, Fatemah
Uzuner, Ozlem
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (04)
[15] Gaming Algorithmic Hate-Speech Detection: Stakes, Parties, and Moves
Haapoja, Jesse
Laaksonen, Salla-Maaria
Lampinen, Airi
SOCIAL MEDIA + SOCIETY, 2020, 6 (02):
[16] Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization
Ashiq, Waqar
Kanwal, Samra
Rafique, Adnan
Waqas, Muhammad
Khurshaid, Tahir
Montero, Elizabeth Caro
Alonso, Alicia Bustamante
Ashraf, Imran
SCIENTIFIC REPORTS, 2024, 14 (01):
[17] Geo-Spatial Mapping of Hate Speech Prediction in Roman Urdu
Aziz, Samia
Sarfraz, Muhammad Shahzad
Usman, Muhammad
Aftab, Muhammad Umar
Rauf, Hafiz Tayyab
MATHEMATICS, 2023, 11 (04)
[18] TM-HOL: Topic memory model for detection of hate speech and offensive language
Chen, Jing
Ma, Kun
Ji, Ke
Chen, Zhenxiang
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (14):
[19] The Sordid Origin of Hate-Speech Laws
Mchangama, Jacob
POLICY REVIEW, 2011, (170) : 45 - 58
[20] Hate speech and offensive language detection in Dravidian languages using deep ensemble framework
Roy, Pradeep Kumar
Bhawal, Snehaan
Subalalitha, Chinnaudayar Navaneethakrishnan
COMPUTER SPEECH AND LANGUAGE, 2022, 75

← 1 2 3 4 5 →