Toxic language detection: A systematic review of Arabic datasets

被引:3
|
作者
Bensalem, Imene [1 ,2 ]
Rosso, Paolo [3 ]
Zitouni, Hanane [4 ]
机构
[1] ESCF Constantine, Constantine, Algeria
[2] Constantine 2 Univ, MISC Lab, Constantine, Algeria
[3] Univ Politecn Valencia, Valencia, Spain
[4] Constantine 2 Univ, Constantine, Algeria
关键词
annotation; Arabic datasets; dataset accessibility; dataset reusability; hate speech; offensive language; toxic language;
D O I
10.1111/exsy.13551
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The detection of toxic language in the Arabic language has emerged as an active area of research in recent years, and reviewing the existing datasets employed for training the developed solutions has become a pressing need. This paper offers a comprehensive survey of Arabic datasets focused on online toxic language. We systematically gathered a total of 54 available datasets and their corresponding papers and conducted a thorough analysis, considering 18 criteria across four primary dimensions: availability details, content, annotation process, and reusability. This analysis enabled us to identify existing gaps and make recommendations for future research works. For the convenience of the research community, the list of the analysed datasets is maintained in a GitHub repository.
引用
收藏
页数:30
相关论文
共 50 条
  • [31] Readability of written medicine information materials in Arabic language: expert and consumer evaluation
    Sinaa Al Aqeel
    Norah Abanmy
    Abeer Aldayel
    Hend Al-Khalifa
    Maha Al-Yahya
    Mona Diab
    BMC Health Services Research, 18
  • [32] Offensive Language and Hate Speech Detection Based on Transfer Learning
    Touahri, Ibtissam
    Mazroui, Azzeddine
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 300 - 311
  • [33] Readability of written medicine information materials in Arabic language: expert and consumer evaluation
    Al Aqeel, Sinaa
    Abanmy, Norah
    Aldayel, Abeer
    Al-Khalifa, Hend
    Al-Yahya, Maha
    Diab, Mona
    BMC HEALTH SERVICES RESEARCH, 2018, 18
  • [34] MiST: A new approach to variant detection in deep sequencing datasets
    Subramanian, Sailakshmi
    Di Pierro, Valentina
    Shah, Hardik
    Jayaprakash, Anitha D.
    Weisberger, Ian
    Shim, Jaehee
    George, Ajish
    Gelb, Bruce D.
    Sachidanandam, Ravi
    NUCLEIC ACIDS RESEARCH, 2013, 41 (16) : e154
  • [35] Detection of hate speech in Arabic tweets using deep learning
    Al-Hassan, Areej
    Al-Dossari, Hmood
    MULTIMEDIA SYSTEMS, 2022, 28 (06) : 1963 - 1974
  • [36] Detection of cyberhate speech towards female sport in the Arabic Xsphere
    Alhayan, Fatimah
    Almobarak, Monerah
    Shalabi, Hawazen
    Alshubaili, Luluwah
    Albatati, Renad
    Alqahtani, Wafa
    Alhaidari, Nofe
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [37] Handling Bias in Toxic Speech Detection: A Survey
    Garg, Tanmay
    Masud, Sarah
    Suresh, Tharun
    Chakraborty, Tanmoy
    ACM COMPUTING SURVEYS, 2023, 55 (13S)
  • [38] Detection of hate speech in Arabic tweets using deep learning
    Areej Al-Hassan
    Hmood Al-Dossari
    Multimedia Systems, 2022, 28 : 1963 - 1974
  • [39] Twitter Hate Speech Detection: A Systematic Review of Methods, Taxonomy Analysis, Challenges, and Opportunities
    Mansur, Zainab
    Omar, Nazlia
    Tiun, Sabrina
    IEEE ACCESS, 2023, 11 : 16226 - 16249
  • [40] Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model
    Aldjanabi, Wassen
    Dahou, Abdelghani
    Al-qaness, Mohammed A. A.
    Abd Elaziz, Mohamed
    Helmi, Ahmed Mohamed
    Damasevicius, Robertas
    INFORMATICS-BASEL, 2021, 8 (04):