Hate-Speech and Offensive Language Detection in Roman Urdu

被引:0
|
作者
Rizwan, Hammad [1 ]
Shakeel, Muhammad Haroon [1 ]
Karim, Asim [1 ]
机构
[1] Lahore Univ Management Sci LUMS, Dept Comp Sci, Lahore, Pakistan
来源
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of automatic hate-speech and offensive language detection in social media content is of utmost importance due to its implications in unprejudiced society concerning race, gender, or religion. Existing research in this area, however, is mainly focused on the English language, limiting the applicability to particular demographics. Despite its prevalence, Roman Urdu (RU) lacks language resources, annotated datasets, and language models for this task. In this study, we: (1) Present a lexicon of hateful words in RU, (2) Develop an annotated dataset called RUHSOLD consisting of 10; 012 tweets in RU with both coarse-grained and fine-grained labels of hate-speech and offensive language, (3) Explore the feasibility of transfer learning of five existing embedding models to RU, (4) Propose a novel deep learning architecture called CNN-gram for hatespeech and offensive language detection and compare its performance with seven current baseline approaches on RUHSOLD dataset, and (5) Train domain-specific embeddings on more than 4:7 million tweets and make them publicly available. We conclude that transfer learning is more beneficial as compared to training embedding from scratch and that the proposed model exhibits greater robustness as compared to the baselines.
引用
收藏
页码:2512 / 2522
页数:11
相关论文
共 50 条
  • [11] UHated: hate speech detection in Urdu language using transfer learning
    Arshad, Muhammad Umair
    Ali, Raza
    Beg, Mirza Omer
    Shahzad, Waseem
    LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (02) : 713 - 732
  • [12] Detection of Hate and Offensive Speech in Text
    Wani, Abid Hussain
    Molvi, Nahida Shafi
    Ashraf, Sheikh Ishrah
    INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2019), 2020, 11886 : 87 - 93
  • [13] On the Impact ofWord Representation in Hate Speech and Offensive Language Detection and Explanation
    Hu, Ruijia
    Dorris, Wyatt
    Vishwamitra, Nishant
    Luo, Feng
    Costello, Matthew
    PROCEEDINGS OF THE TENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, CODASPY 2020, 2020, : 171 - 173
  • [15] Gaming Algorithmic Hate-Speech Detection: Stakes, Parties, and Moves
    Haapoja, Jesse
    Laaksonen, Salla-Maaria
    Lampinen, Airi
    SOCIAL MEDIA + SOCIETY, 2020, 6 (02):
  • [16] Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization
    Ashiq, Waqar
    Kanwal, Samra
    Rafique, Adnan
    Waqas, Muhammad
    Khurshaid, Tahir
    Montero, Elizabeth Caro
    Alonso, Alicia Bustamante
    Ashraf, Imran
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [17] Geo-Spatial Mapping of Hate Speech Prediction in Roman Urdu
    Aziz, Samia
    Sarfraz, Muhammad Shahzad
    Usman, Muhammad
    Aftab, Muhammad Umar
    Rauf, Hafiz Tayyab
    MATHEMATICS, 2023, 11 (04)
  • [18] TM-HOL: Topic memory model for detection of hate speech and offensive language
    Chen, Jing
    Ma, Kun
    Ji, Ke
    Chen, Zhenxiang
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (14):
  • [19] The Sordid Origin of Hate-Speech Laws
    Mchangama, Jacob
    POLICY REVIEW, 2011, (170) : 45 - 58
  • [20] Hate speech and offensive language detection in Dravidian languages using deep ensemble framework
    Roy, Pradeep Kumar
    Bhawal, Snehaan
    Subalalitha, Chinnaudayar Navaneethakrishnan
    COMPUTER SPEECH AND LANGUAGE, 2022, 75