Toxic language detection: A systematic review of Arabic datasets

被引：3

作者：

Bensalem, Imene ^{[1
,2
]}

Rosso, Paolo ^{[3
]}

Zitouni, Hanane ^{[4
]}

机构：

[1] ESCF Constantine, Constantine, Algeria

[2] Constantine 2 Univ, MISC Lab, Constantine, Algeria

[3] Univ Politecn Valencia, Valencia, Spain

[4] Constantine 2 Univ, Constantine, Algeria

来源：

EXPERT SYSTEMS | 2024年 / 41卷 / 08期

关键词：

annotation; Arabic datasets; dataset accessibility; dataset reusability; hate speech; offensive language; toxic language;

D O I：

10.1111/exsy.13551

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The detection of toxic language in the Arabic language has emerged as an active area of research in recent years, and reviewing the existing datasets employed for training the developed solutions has become a pressing need. This paper offers a comprehensive survey of Arabic datasets focused on online toxic language. We systematically gathered a total of 54 available datasets and their corresponding papers and conducted a thorough analysis, considering 18 criteria across four primary dimensions: availability details, content, annotation process, and reusability. This analysis enabled us to identify existing gaps and make recommendations for future research works. For the convenience of the research community, the list of the analysed datasets is maintained in a GitHub repository.

引用

页数：30

共 50 条

[21] Unlocking the Potential: A Comprehensive Systematic Review of ChatGPT in Natural Language Processing Tasks
Alomari, Ebtesam Ahmad
CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, 141 (01): : 43 - 85
[22] Arabic Offensive Language Classification: Leveraging Transformer, LSTM, and SVM
Rasheed, Areeg Fahad
Zarkoosh, M.
Abbas, Safa F.
Al-Azzawi, Sana Sabah
2023 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES, ICMLANT, 2023, : 115 - 120
[23] Sentiment analysis methods for politics and hate speech contents in Spanish language: a systematic review
del Valle Martin, Ernesto
de la Fuente Valentin, Luis
IEEE LATIN AMERICA TRANSACTIONS, 2023, 21 (03) : 408 - 418
[24] An Automatic Approach for the Identification of Offensive Language in Perso-Arabic Urdu Language: Dataset Creation and Evaluation
Din, Salah Ud
Khusro, Shah
Khan, Farman Ali
Ahmad, Munir
Ali, Oualid
Ghazal, Taher M.
IEEE ACCESS, 2025, 13 : 19755 - 19769
[25] USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
Ul Haq, Nauman
Ullah, Mohib
Khan, Rafiullah
Ahmad, Arshad
Almogren, Ahmad
Hayat, Bashir
Shafi, Bushra
COMPLEXITY, 2020, 2020
[26] A survey on multi-lingual offensive language detection
Mnassri, Khouloud
Farahbakhsh, Reza
Chalehchaleh, Razieh
Rajapaksha, Praboda
Jafari, Amir Reza
Li, Guanlin
Crespi, Noel
PEERJ COMPUTER SCIENCE, 2024, 10
[27] Hate and offensive speech detection on Arabic social media
Alsafari S.
Sadaoui S.
Mouhoub M.
Online Social Networks and Media, 2020, 19
[28] Detection of fake news and hate speech for Ethiopian languages: a systematic review of the approaches
Wubetu Barud Demilie
Ayodeji Olalekan Salau
Journal of Big Data, 9
[29] Detection of fake news and hate speech for Ethiopian languages: a systematic review of the approaches
Demilie, Wubetu Barud
Salau, Ayodeji Olalekan
JOURNAL OF BIG DATA, 2022, 9 (01)
[30] A Review of Deep Learning Techniques for Multimodal Fake News and Harmful Languages Detection
Festus Ayetiran, Eniafe
Ozgobek, Ozlem
IEEE ACCESS, 2024, 12 : 76133 - 76153

← 1 2 3 4 5 →