A survey on multi-lingual offensive language detection

被引:1
作者
Mnassri, Khouloud [1 ]
Farahbakhsh, Reza [1 ]
Chalehchaleh, Razieh [1 ]
Rajapaksha, Praboda [1 ]
Jafari, Amir Reza [1 ]
Li, Guanlin [1 ]
Crespi, Noel [1 ]
机构
[1] Inst Polytech Paris, Telecom SudParis, Samovar, Palaiseau, France
关键词
Literature review; Offensive language; Hate speech; Multilingualism; Social media; HATE SPEECH DETECTION; IDENTIFICATION; CLASSIFICATION;
D O I
10.7717/peerj-cs.1934
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prevalence of offensive content on online communication and social media platforms is growing more and more common, which makes its detection difficult, especially in multilingual settings. The term "Offensive Language" encompasses a wide range of expressions, including various forms of hate speech and aggressive content. Therefore, exploring multilingual offensive content, that goes beyond a single language, focus and represents more linguistic diversities and cultural factors. By exploring multilingual offensive content, we can broaden our understanding and effectively combat the widespread global impact of offensive language. This survey examines the existing state of multilingual offensive language detection, including a comprehensive analysis on previous multilingual approaches, and existing datasets, as well as provides resources in the field. We also explore the related community challenges on this task, which include technical, cultural, and linguistic ones, as well as their limitations. Furthermore, in this survey we propose several potential future directions toward more efficient solutions for multilingual offensive language detection, enabling safer digital communication environment worldwide.
引用
收藏
页数:48
相关论文
共 185 条
  • [1] Abercrombie Gavin, 2023, P 17 LING ANN WORKSH, P96
  • [2] Aharoni R, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P3874
  • [3] Ahn H., 2020, arXiv preprint arXiv:2008.01354, P1576
  • [4] Ahn H, 2020, Arxiv, DOI arXiv:2008.01354
  • [5] Ahuja K, 2023, P 2023 C EMPIRICAL M, P4232
  • [6] Akhtar Sohail., 2021, arXiv
  • [7] ISE-Hate: A benchmark corpus for inter-faith, sectarian, and ethnic hatred detection on social media in Urdu
    Akram, Muhammad Hammad
    Shahzad, Khurram
    Bashir, Maryam
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [8] Al-Hassan A., 2019, P 6 INT C COMP SCI I, VVolume 10, P83, DOI DOI 10.5121/CSIT.2019.90208
  • [9] Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model
    Aldjanabi, Wassen
    Dahou, Abdelghani
    Al-qaness, Mohammed A. A.
    Abd Elaziz, Mohamed
    Helmi, Ahmed Mohamed
    Damasevicius, Robertas
    [J]. INFORMATICS-BASEL, 2021, 8 (04):
  • [10] Alfina I, 2017, INT C ADV COMP SCI I, P233, DOI 10.1109/ICACSIS.2017.8355039