Hate-Speech and Offensive Language Detection in Roman Urdu

被引:0
作者
Rizwan, Hammad [1 ]
Shakeel, Muhammad Haroon [1 ]
Karim, Asim [1 ]
机构
[1] Lahore Univ Management Sci LUMS, Dept Comp Sci, Lahore, Pakistan
来源
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of automatic hate-speech and offensive language detection in social media content is of utmost importance due to its implications in unprejudiced society concerning race, gender, or religion. Existing research in this area, however, is mainly focused on the English language, limiting the applicability to particular demographics. Despite its prevalence, Roman Urdu (RU) lacks language resources, annotated datasets, and language models for this task. In this study, we: (1) Present a lexicon of hateful words in RU, (2) Develop an annotated dataset called RUHSOLD consisting of 10; 012 tweets in RU with both coarse-grained and fine-grained labels of hate-speech and offensive language, (3) Explore the feasibility of transfer learning of five existing embedding models to RU, (4) Propose a novel deep learning architecture called CNN-gram for hatespeech and offensive language detection and compare its performance with seven current baseline approaches on RUHSOLD dataset, and (5) Train domain-specific embeddings on more than 4:7 million tweets and make them publicly available. We conclude that transfer learning is more beneficial as compared to training embedding from scratch and that the proposed model exhibits greater robustness as compared to the baselines.
引用
收藏
页码:2512 / 2522
页数:11
相关论文
共 50 条
  • [31] Elevating Offensive Language Detection: CNN-GRU and BERT for Enhanced Hate Speech Identification
    Madhavi, M.
    Agal, Sanjay
    Odedra, Niyati Dhirubhai
    Chowdhary, Harish
    Ruprah, Taranpreet Singh
    Vuyyuru, Veera Ankalu
    El-Ebiary, Yousef A. Baker
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (05) : 1164 - 1172
  • [32] Context-Aware Deep Learning Model for Detection of Roman Urdu Hate Speech on Social Media Platform
    Bilal, Muhammad
    Khan, Atif
    Jan, Salman
    Musa, Shahrulniza
    [J]. IEEE ACCESS, 2022, 10 : 121133 - 121151
  • [33] Clear and present danger (Hate-speech and the importance of free speech)
    Neier, A
    [J]. INDEX ON CENSORSHIP, 1998, 27 (01) : 57 - 59
  • [34] Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection
    Watanabe, Hajime
    Bouazizi, Mondher
    Ohtsuki, Tomoaki
    [J]. IEEE ACCESS, 2018, 6 : 13825 - 13835
  • [35] Effective hate-speech detection in Twitter data using recurrent neural networks
    Pitsilis, Georgios K.
    Ramampiaro, Heri
    Langseth, Helge
    [J]. APPLIED INTELLIGENCE, 2018, 48 (12) : 4730 - 4742
  • [36] Interpretable and High-Performance Hate and Offensive Speech Detection
    Babaeianjelodar, Marzieh
    Prudhvi, Gurram Poorna
    Lorenz, Stephen
    Chen, Keyu
    Mondal, Sumona
    Dey, Soumyabrata
    Kumar, Navin
    [J]. HCI INTERNATIONAL 2022 - LATE BREAKING PAPERS: INTERACTING WITH EXTENDED REALITY AND ARTIFICIAL INTELLIGENCE, 2022, 13518 : 233 - 244
  • [37] Effective hate-speech detection in Twitter data using recurrent neural networks
    Georgios K. Pitsilis
    Heri Ramampiaro
    Helge Langseth
    [J]. Applied Intelligence, 2018, 48 : 4730 - 4742
  • [38] HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection
    Vargas, Francielle
    Carvalho, Isabelle
    Goes, Fabiana
    Pardo, Thiago A. S.
    Benevenuto, Fabricio
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7174 - 7183
  • [39] Language Agnostic Hate Speech Detection
    Arango, Ayme
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 2475 - 2475
  • [40] A comprehensive review on Arabic offensive language and hate speech detection on social media: methods, challenges and solutions
    Abdelsamie, Mahmoud Mohamed
    Azab, Shahira Shaaban
    Hefny, Hesham A.
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)