A systematic literature review of hate speech identification on Arabic Twitter data: research challenges and future directions

被引：6

作者：

Alhazmi, Ali ^{[1
,2
]}

Mahmud, Rohana ^{[1
]}

Idris, Norisma ^{[1
]}

Abo, Mohamed Elhag Mohamed ^{[3
]}

Eke, Christopher ^{[4
]}

机构：

[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur, Malaysia

[2] Jazan Univ, Dept Informat Technol & Secur, Jazan, Saudi Arabia

[3] Future Univ, Dept Comp Sci, Khartoum, Sudan

[4] Fed Univ Lafia, Dept Comp Sci, Fac Comp, Lafia, Nasarawa State, Nigeria

来源：

PEERJ COMPUTER SCIENCE | 2024年 / 10卷

关键词：

Arabic tweets; Automatic identification; Classification techniques; Hate speech; Natural language processing; SLR;

D O I：

10.7717/peerj-cs.1966

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The automatic speech identification in Arabic tweets has generated substantial attention among academics in the fields of text mining and natural language processing (NLP). The quantity of studies done on this subject has experienced significant growth. This study aims to provide an overview of this field by conducting a systematic review of literature that focuses on automatic hate speech identification, particularly in the Arabic language. The goal is to examine the research trends in Arabic hate speech identification and offer guidance to researchers by highlighting the most significant studies published between 2018 and 2023. This systematic study addresses five specific research questions concerning the types of the Arabic language used, hate speech categories, classification techniques, feature engineering techniques, performance metrics, validation methods, existing challenges faced by researchers, and potential future research directions. Through a comprehensive search across nine academic databases, 24 studies that met the predefined inclusion criteria and quality assessment were identified. The review findings revealed the existence of many Arabic linguistic varieties used in hate speech on Twitter, with modern standard Arabic (MSA) being the most prominent. In identification techniques, machine learning categories are the most used technique for Arabic hate speech identification. The result also shows different feature engineering techniques used and indicates that N -gram and CBOW are the most used techniques. F1 -score, precision, recall, and accuracy were also identified as the most used performance metric. The review also shows that the most used validation method is the train/test split method. Therefore, the findings of this study can serve as valuable guidance for researchers in enhancing the efficacy of their models in future investigations. Besides, algorithm development, policy rule regulation, community management, and legal and ethical consideration are other real -world applications that can be reaped from this research.

引用

页数：43

共 100 条

[1] Arabic Hate Speech Detection Using Deep Recurrent Neural Networks [J].

Al Anezi, Faisal Yousif .

APPLIED SCIENCES-BASEL, 2022, 12 (12)

[2] Synopsis on Arabic speech recognitionq [J].

Al-Anzi, Fawaz S. ;

AbuZeina, Dia .

AIN SHAMS ENGINEERING JOURNAL, 2022, 13 (02)

[3]

Al-Hassan A., 2019, P 6 INT C COMP SCI I, P83, DOI DOI 10.5121/CSIT.2019.90208

[4] Detection of hate speech in Arabic tweets using deep learning [J].

Al-Hassan, Areej ;

Al-Dossari, Hmood .

MULTIMEDIA SYSTEMS, 2022, 28 (06) :1963-1974

[5] Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach [J].

Al-Makhadmeh, Zafer ;

Tolba, Amr .

COMPUTING, 2020, 102 (02) :501-522

[6] Investigating the effect of combining GRU neural networks with handcrafted features for religious hatred detection on Arabic Twitter space [J].

Albadi, Nuha ;

Kurdi, Maram ;

Mishra, Shivakant .

SOCIAL NETWORK ANALYSIS AND MINING, 2019, 9 (01)

[7]

Albadi N, 2018, 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), P69, DOI 10.1109/ASONAM.2018.8508247

[8] Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model [J].