Toxic language detection: A systematic review of Arabic datasets

被引：3

作者：

Bensalem, Imene ^{[1
,2
]}

Rosso, Paolo ^{[3
]}

Zitouni, Hanane ^{[4
]}

机构：

[1] ESCF Constantine, Constantine, Algeria

[2] Constantine 2 Univ, MISC Lab, Constantine, Algeria

[3] Univ Politecn Valencia, Valencia, Spain

[4] Constantine 2 Univ, Constantine, Algeria

来源：

EXPERT SYSTEMS | 2024年 / 41卷 / 08期

关键词：

annotation; Arabic datasets; dataset accessibility; dataset reusability; hate speech; offensive language; toxic language;

D O I：

10.1111/exsy.13551

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The detection of toxic language in the Arabic language has emerged as an active area of research in recent years, and reviewing the existing datasets employed for training the developed solutions has become a pressing need. This paper offers a comprehensive survey of Arabic datasets focused on online toxic language. We systematically gathered a total of 54 available datasets and their corresponding papers and conducted a thorough analysis, considering 18 criteria across four primary dimensions: availability details, content, annotation process, and reusability. This analysis enabled us to identify existing gaps and make recommendations for future research works. For the convenience of the research community, the list of the analysed datasets is maintained in a GitHub repository.

引用

页数：30

共 50 条

[1] A Survey of Offensive Language Detection for the Arabic Language
Husain, Fatemah
Uzuner, Ozlem
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (01)
[2] Fake-news detection: a survey of evaluation Arabic datasets
Yousef, Mohammed Abbas
Elkorany, Abeer
Bayomi, Hanaa
SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
[3] Offensive Language Detection from Arabic Texts
Awajan, Arafat A.
INTELLIGENT COMPUTING, VOL 3, 2024, 2024, 1018 : 77 - 91
[4] Detection of Hateful Social Media Content for Arabic Language
Al-Ibrahim, Rogayah M.
Ali, Mostafa Z.
Najadat, Hassan M.
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)
[5] A systematic review of hate speech automatic detection using natural language processing
Jahan, Md Saroar
Oussalah, Mourad
NEUROCOMPUTING, 2023, 546
[6] A Multi-Platform Arabic News Comment Dataset for Offensive Language Detection
Chowdhury, Shammur A.
Mubarak, Hamdy
Abdelali, Ahmed
Jung, Soon-gyo
Jansen, Bernard J.
Salminen, Joni
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6203 - 6212
[7] Detection of Arabic offensive language in social media using machine learning models
Mousa, Aya
Shahin, Ismail
Nassif, Ali Bou
Elnagar, Ashraf
INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 22
[8] Hate Speech Detection using Word Embedding and Deep Learning in the Arabic Language Context
Faris, Hossam
Aljarah, Ibrahim
Habib, Maria
Castillo, Pedro A.
ICPRAM: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2020, : 453 - 460
[9] Augmented Behavioral Annotation Tools, with Application to Multimodal Datasets and Models: A Systematic Review
Watson, Eleanor
Viana, Thiago
Zhang, Shujun
AI, 2023, 4 (01) : 128 - 171
[10] Arabic Text Mining: A Systematic Review of the Published Literature 2002-2014
Al-Mahmoud, Hind
Al-Razgan, Muna
2015 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (ICCC), 2015, : 65 - 71

← 1 2 3 4 5 →