Detection of hate speech in Arabic tweets using deep learning

被引:68
作者
Al-Hassan, Areej [1 ]
Al-Dossari, Hmood [1 ]
机构
[1] King Saud Univ, Coll Comp Sci & Informat Syst, Informat Syst Dept, Riyadh, Saudi Arabia
关键词
Hate speech; Arabic tweets; Arabic NLP; Deep learning; Multiclassification; Social networks; Text mining;
D O I
10.1007/s00530-020-00742-w
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, people are communicating through social networks everywhere. However, for whatever reason it is noticeable that verbal misbehaviors, such as hate speech is now propagated through the social networks. One of the most popular social networks is Twitter which has gained widespread in the Arabic region. This research aims to identify and classify Arabic tweets into 5 distinct classes: none, religious, racial, sexism or general hate. A dataset of 11 K tweets was collected and labelled and SVM model was used as a baseline to be compared against 4 deep learning models: LTSM, CNN + LTSM, GRU and CNN + GRU. The results show that all the 4 deep learning models outperform the SVM model in detecting hateful tweets. Although the SVM achieves an overall recall of 74%, the deep learning models have an average recall of 75%. However, adding a layer of CNN to LTSM enhances the overall performance of detection with 72% precision, 75% recall and 73% F1 score.
引用
收藏
页码:1963 / 1974
页数:12
相关论文
共 27 条
[1]  
Abozinadah E.A., 2017, P INT C COMP DAT AN, P6, DOI DOI 10.1145/3093241.3093281
[2]  
Al-Hassan A., 2019, P COMP SCI INF TECHN, V9, P83, DOI DOI 10.5121/CSIT.2019.90208
[3]  
Alabbas W., 2017, 2017 IEEE INT C SOC, P1
[4]  
Albadi N, 2018, 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), P69, DOI 10.1109/ASONAM.2018.8508247
[5]  
Alshutayri A, 2018, Creating an Arabic dialect text corpus by exploring Twitter, Facebook, and online newspapers
[6]   Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis [J].
Assiri, Adel ;
Emam, Ahmed ;
Al-Dossari, Hmood .
JOURNAL OF INFORMATION SCIENCE, 2018, 44 (02) :184-202
[7]   Deep Learning for Hate Speech Detection in Tweets [J].
Badjatiya, Pinkesh ;
Gupta, Shashank ;
Gupta, Manish ;
Varma, Vasudeva .
WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, :759-760
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]   A Pattern-Based Approach for Sarcasm Detection on Twitter [J].
Bouazizi, Mondher ;
Otsuki , Tomoaki .
IEEE ACCESS, 2016, 4 :5477-5488
[10]   Us and them: identifying cyber hate on Twitter across multiple protected characteristics [J].
Burnap, Pete ;
Williams, Matthew L. .
EPJ DATA SCIENCE, 2016, 5