UHated: hate speech detection in Urdu language using transfer learning

被引:10
作者
Arshad, Muhammad Umair [1 ]
Ali, Raza [1 ]
Beg, Mirza Omer [1 ]
Shahzad, Waseem [1 ]
机构
[1] Natl Univ Comp & Emerging Sci, Islamabad, Pakistan
关键词
Hate speech detection; Deep learning; Language semantics; Twitter; Social network analysis; Low-resource languages;
D O I
10.1007/s10579-023-09642-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Social media has become a driving force for social change in the global society. Events that take place in one part of the world can quickly reverberate across the globe due to the vast amount of data generated on these platforms. However, developers of these platforms face numerous challenges in keeping cyberspace as inclusive and healthy as possible. In recent years, there has been an increase in offensive and hate speech on social media. Manual efforts to address this issue have been inadequate due to the vast scope of the problem. Therefore, there is a need for an automated technique that can detect and remove offensive and hateful comments before they can cause harm. In this research, we use transfer learning to utilize pre-trained FastText Urdu word embeddings and multi-lingual BERT embeddings (RoBERTa) for our task. We also develop an Urdu language hate lexicon and use it to create an annotated dataset of 7800 Urdu tweets. Our results show that RoBERTa is able to achieve a macro F1-score of 0.82 on our multi-class classification task, outperforming deep learning and machine learning baseline models.
引用
收藏
页码:713 / 732
页数:20
相关论文
共 30 条
[1]   Automatic Detection of Offensive Language for Urdu and Roman Urdu [J].
Akhter, Muhammad Pervez ;
Zheng Jiangbin ;
Naqvi, Irfan Raza ;
Abdelmajeed, Mohammed ;
Sadiq, Muhammad Tariq .
IEEE ACCESS, 2020, 8 (08) :91213-91226
[2]   Detecting White Supremacist Hate Speech Using Domain Specific Word Embedding With Deep Learning and BERT [J].
Alatawi, Hind S. ;
Alhothali, Areej M. ;
Moria, Kawthar M. .
IEEE ACCESS, 2021, 9 :106363-106374
[3]  
Albadi N, 2018, 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), P69, DOI 10.1109/ASONAM.2018.8508247
[4]   Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis [J].
Ali, Muhammad Z. ;
Ehsan-Ul-Haq ;
Rauf, Sahar ;
Javed, Kashif ;
Hussain, Sarmad .
IEEE ACCESS, 2021, 9 :84296-84305
[5]  
[Anonymous], 2017, Data Science and Pattern Recognition
[6]   An Ensemble Method for Radicalization and Hate Speech Detection Online Empowered by Sentic Computing [J].
Araque, Oscar ;
Iglesias, Carlos A. .
COGNITIVE COMPUTATION, 2022, 14 (01) :48-61
[7]  
Arshad M. U., 2019, 2019 22 INT MULTITOP, P1
[8]   TOP-Rank: A TopicalPostionRank for Extraction and Classification of Keyphrases in Text [J].
Awan, Mubashar Nazar ;
Beg, Mirza Omer .
COMPUTER SPEECH AND LANGUAGE, 2021, 65 (65)
[9]  
Baruah A., 2019, P 13 INT WORKSH SEM, P371, DOI DOI 10.18653/V1/S19-2065
[10]  
Benito D., 2019, P 13 INT WORKSH SEM, P396