Hate-Speech and Offensive Language Detection in Roman Urdu

被引:0
作者
Rizwan, Hammad [1 ]
Shakeel, Muhammad Haroon [1 ]
Karim, Asim [1 ]
机构
[1] Lahore Univ Management Sci LUMS, Dept Comp Sci, Lahore, Pakistan
来源
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of automatic hate-speech and offensive language detection in social media content is of utmost importance due to its implications in unprejudiced society concerning race, gender, or religion. Existing research in this area, however, is mainly focused on the English language, limiting the applicability to particular demographics. Despite its prevalence, Roman Urdu (RU) lacks language resources, annotated datasets, and language models for this task. In this study, we: (1) Present a lexicon of hateful words in RU, (2) Develop an annotated dataset called RUHSOLD consisting of 10; 012 tweets in RU with both coarse-grained and fine-grained labels of hate-speech and offensive language, (3) Explore the feasibility of transfer learning of five existing embedding models to RU, (4) Propose a novel deep learning architecture called CNN-gram for hatespeech and offensive language detection and compare its performance with seven current baseline approaches on RUHSOLD dataset, and (5) Train domain-specific embeddings on more than 4:7 million tweets and make them publicly available. We conclude that transfer learning is more beneficial as compared to training embedding from scratch and that the proposed model exhibits greater robustness as compared to the baselines.
引用
收藏
页码:2512 / 2522
页数:11
相关论文
共 50 条
  • [21] Hate Speech and Offensive Language Detection using an Emotion-aware Shared Encoder
    Mnassri, Khouloud
    Rajapaksha, Praboda
    Farahbakhsh, Reza
    Crespi, Noel
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 2852 - 2857
  • [22] Hate speech and offensive language detection in Dravidian languages using deep ensemble framework
    Roy, Pradeep Kumar
    Bhawal, Snehaan
    Subalalitha, Chinnaudayar Navaneethakrishnan
    COMPUTER SPEECH AND LANGUAGE, 2022, 75
  • [23] Hate-speech - A suitable case for censorship?
    不详
    INDEX ON CENSORSHIP, 1998, 27 (01) : 5 - 5
  • [24] Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications
    Bilal, Muhammad
    Khan, Atif
    Jan, Salman
    Musa, Shahrulniza
    Ali, Shaukat
    SENSORS, 2023, 23 (08)
  • [25] Hate and offensive speech detection on Arabic social media
    Alsafari S.
    Sadaoui S.
    Mouhoub M.
    Online Social Networks and Media, 2020, 19
  • [26] Emojis as anchors to detect Arabic offensive language and hate speech
    Mubarak, Hamdy
    Hassan, Sabit
    Chowdhury, Shammur Absar
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (06) : 1436 - 1457
  • [27] Passion-Net: a robust precise and explainable predictor for hate speech detection in Roman Urdu text
    Mehmood, Faiza
    Ghafoor, Hina
    Asim, Muhammad Nabeel
    Ghani, Muhammad Usman
    Mahmood, Waqar
    Dengel, Andreas
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (06) : 3077 - 3100
  • [28] Passion-Net: a robust precise and explainable predictor for hate speech detection in Roman Urdu text
    Faiza Mehmood
    Hina Ghafoor
    Muhammad Nabeel Asim
    Muhammad Usman Ghani
    Waqar Mahmood
    Andreas Dengel
    Neural Computing and Applications, 2024, 36 : 3077 - 3100
  • [29] Finnish Hate-Speech Detection on Social Media Using CNN and FinBERT
    Jahan, Md Saroar
    Oussalah, Mourad
    Arhab, Nabil
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 876 - 882
  • [30] A Social-Aware Deep Learning Approach for Hate-Speech Detection
    Apostolopoulos, George C.
    Liakos, Panagiotis
    Delis, Alex
    WEB AND BIG DATA, PT I, APWEB-WAIM 2022, 2023, 13421 : 536 - 544