Toxic language detection: A systematic review of Arabic datasets

被引：3

作者：

Bensalem, Imene ^{[1
,2
]}

Rosso, Paolo ^{[3
]}

Zitouni, Hanane ^{[4
]}

机构：

[1] ESCF Constantine, Constantine, Algeria

[2] Constantine 2 Univ, MISC Lab, Constantine, Algeria

[3] Univ Politecn Valencia, Valencia, Spain

[4] Constantine 2 Univ, Constantine, Algeria

来源：

EXPERT SYSTEMS | 2024年 / 41卷 / 08期

关键词：

annotation; Arabic datasets; dataset accessibility; dataset reusability; hate speech; offensive language; toxic language;

D O I：

10.1111/exsy.13551

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The detection of toxic language in the Arabic language has emerged as an active area of research in recent years, and reviewing the existing datasets employed for training the developed solutions has become a pressing need. This paper offers a comprehensive survey of Arabic datasets focused on online toxic language. We systematically gathered a total of 54 available datasets and their corresponding papers and conducted a thorough analysis, considering 18 criteria across four primary dimensions: availability details, content, annotation process, and reusability. This analysis enabled us to identify existing gaps and make recommendations for future research works. For the convenience of the research community, the list of the analysed datasets is maintained in a GitHub repository.

引用

页数：30

共 50 条

[31] Readability of written medicine information materials in Arabic language: expert and consumer evaluation
Sinaa Al Aqeel
Norah Abanmy
Abeer Aldayel
Hend Al-Khalifa
Maha Al-Yahya
Mona Diab
BMC Health Services Research, 18
[32] Offensive Language and Hate Speech Detection Based on Transfer Learning
Touahri, Ibtissam
Mazroui, Azzeddine
ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 300 - 311
[33] Readability of written medicine information materials in Arabic language: expert and consumer evaluation
Al Aqeel, Sinaa
Abanmy, Norah
Aldayel, Abeer
Al-Khalifa, Hend
Al-Yahya, Maha
Diab, Mona
BMC HEALTH SERVICES RESEARCH, 2018, 18
[34] MiST: A new approach to variant detection in deep sequencing datasets
Subramanian, Sailakshmi
Di Pierro, Valentina
Shah, Hardik
Jayaprakash, Anitha D.
Weisberger, Ian
Shim, Jaehee
George, Ajish
Gelb, Bruce D.
Sachidanandam, Ravi
NUCLEIC ACIDS RESEARCH, 2013, 41 (16) : e154
[35] Detection of hate speech in Arabic tweets using deep learning
Al-Hassan, Areej
Al-Dossari, Hmood
MULTIMEDIA SYSTEMS, 2022, 28 (06) : 1963 - 1974
[36] Detection of cyberhate speech towards female sport in the Arabic Xsphere
Alhayan, Fatimah
Almobarak, Monerah
Shalabi, Hawazen
Alshubaili, Luluwah
Albatati, Renad
Alqahtani, Wafa
Alhaidari, Nofe
PEERJ COMPUTER SCIENCE, 2024, 10
[37] Handling Bias in Toxic Speech Detection: A Survey
Garg, Tanmay
Masud, Sarah
Suresh, Tharun
Chakraborty, Tanmoy
ACM COMPUTING SURVEYS, 2023, 55 (13S)
[38] Detection of hate speech in Arabic tweets using deep learning
Areej Al-Hassan
Hmood Al-Dossari
Multimedia Systems, 2022, 28 : 1963 - 1974
[39] Twitter Hate Speech Detection: A Systematic Review of Methods, Taxonomy Analysis, Challenges, and Opportunities
Mansur, Zainab
Omar, Nazlia
Tiun, Sabrina
IEEE ACCESS, 2023, 11 : 16226 - 16249
[40] Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model
Aldjanabi, Wassen
Dahou, Abdelghani
Al-qaness, Mohammed A. A.
Abd Elaziz, Mohamed
Helmi, Ahmed Mohamed
Damasevicius, Robertas
INFORMATICS-BASEL, 2021, 8 (04):

← 1 2 3 4 5 →