Hate-Speech and Offensive Language Detection in Roman Urdu

被引:0
|
作者
Rizwan, Hammad [1 ]
Shakeel, Muhammad Haroon [1 ]
Karim, Asim [1 ]
机构
[1] Lahore Univ Management Sci LUMS, Dept Comp Sci, Lahore, Pakistan
来源
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of automatic hate-speech and offensive language detection in social media content is of utmost importance due to its implications in unprejudiced society concerning race, gender, or religion. Existing research in this area, however, is mainly focused on the English language, limiting the applicability to particular demographics. Despite its prevalence, Roman Urdu (RU) lacks language resources, annotated datasets, and language models for this task. In this study, we: (1) Present a lexicon of hateful words in RU, (2) Develop an annotated dataset called RUHSOLD consisting of 10; 012 tweets in RU with both coarse-grained and fine-grained labels of hate-speech and offensive language, (3) Explore the feasibility of transfer learning of five existing embedding models to RU, (4) Propose a novel deep learning architecture called CNN-gram for hatespeech and offensive language detection and compare its performance with seven current baseline approaches on RUHSOLD dataset, and (5) Train domain-specific embeddings on more than 4:7 million tweets and make them publicly available. We conclude that transfer learning is more beneficial as compared to training embedding from scratch and that the proposed model exhibits greater robustness as compared to the baselines.
引用
收藏
页码:2512 / 2522
页数:11
相关论文
共 50 条
  • [1] Hate Speech Detection in Roman Urdu
    Khan, Muhammad Moin
    Shahzad, Khurram
    Malik, Muhammad Kamran
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (01)
  • [2] Automatic Detection of Offensive Language for Urdu and Roman Urdu
    Akhter, Muhammad Pervez
    Zheng Jiangbin
    Naqvi, Irfan Raza
    Abdelmajeed, Mohammed
    Sadiq, Muhammad Tariq
    IEEE ACCESS, 2020, 8 (08): : 91213 - 91226
  • [3] ON THE OFFENSIVE + HATE-SPEECH AND FREEDOM OF EXPRESSION
    不详
    INDEX ON CENSORSHIP, 1992, 21 (07) : 9 - 12
  • [4] Offensive Language and Hate Speech Detection for Danish
    Sigurbergsson, Gudbjartur Ingi
    Derczynski, Leon
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3498 - 3508
  • [5] Spanish hate-speech detection in football
    Montesinos-Canovas, Esteban
    Garcia-Sanchez, Francisco
    Antonio Garcia-Diaz, Jose
    Alcaraz-Marmol, Gema
    Valencia-Garcia, Rafael
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (71): : 15 - 27
  • [6] Exploring Data Augmentation Strategies for Hate Speech Detection in Roman Urdu
    Azam, Ubaid
    Rizwan, Hammad
    Karim, Asim
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4523 - 4531
  • [7] Towards Automatic Detection and Explanation of Hate Speech and Offensive Language
    Dorris, Wyatt
    Hu, Ruijia
    Vishwamitra, Nishant
    Luo, Feng
    Costello, Matthew
    PROCEEDINGS OF THE SIXTH INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS (IWSPA'20), 2020, : 23 - 29
  • [8] Offensive Language and Hate Speech Detection Based on Transfer Learning
    Touahri, Ibtissam
    Mazroui, Azzeddine
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 300 - 311
  • [9] The speech that kills (Hate-speech)
    Owen, U
    INDEX ON CENSORSHIP, 1998, 27 (01) : 32 - 39
  • [10] UHated: hate speech detection in Urdu language using transfer learning
    Muhammad Umair Arshad
    Raza Ali
    Mirza Omer Beg
    Waseem Shahzad
    Language Resources and Evaluation, 2023, 57 : 713 - 732