Handling Bias in Toxic Speech Detection: A Survey

被引:18
|
作者
Garg, Tanmay [1 ]
Masud, Sarah [1 ]
Suresh, Tharun [1 ]
Chakraborty, Tanmoy [1 ]
机构
[1] IIIT Delhi, Delhi, India
关键词
Toxic speech; hate speech; social networks; unintended bias; bias mitigation; bias shift; COGNITION;
D O I
10.1145/3580494
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors such as the context, geography, socio-political climate, and background of the producers and consumers of the posts play a crucial role in determining if the content can be flagged as toxic. Adoption of automated toxicity detection models in production can thus lead to a sidelining of the various groups they aim to help in the first place. It has piqued researchers' interest in examining unintended biases and their mitigation. Due to the nascent and multi-faceted nature of the work, complete literature is chaotic in its terminologies, techniques, and findings. In this article, we put together a systematic study of the limitations and challenges of existing methods for mitigating bias in toxicity detection. We look closely at proposed methods for evaluating and mitigating bias in toxic speech detection. To examine the limitations of existing methods, we also conduct a case study to introduce the concept of bias shift due to knowledge-based bias mitigation. The survey concludes with an overview of the critical challenges, research gaps, and future directions. While reducing toxicity on online platforms continues to be an active area of research, a systematic study of various biases and their mitigation strategies will help the research community produce robust and fair models.(1)
引用
收藏
页数:32
相关论文
共 50 条
  • [21] Handling data scarcity through data augmentation for detecting offensive speech
    Sekkate, Sara
    Chebbi, Safa
    Adib, Abdellah
    Ben Jebara, Sofia
    ANNALS OF TELECOMMUNICATIONS, 2025,
  • [22] Teachers' intervention strategies for handling hate-speech incidents in schools
    Bilz, Ludwig
    Fischer, Saskia M.
    Kansok-Dusche, Julia
    Wachs, Sebastian
    Wettstein, Alexander
    SOCIAL PSYCHOLOGY OF EDUCATION, 2024, 27 (05) : 2701 - 2724
  • [23] A valid question: Could hate speech condition bias in the brain?
    Murrow, Gail B.
    Murrow, Richard
    JOURNAL OF LAW AND THE BIOSCIENCES, 2016, 3 (01): : 196 - 201
  • [25] MLHS-CGCapNet: A Lightweight Model for Multilingual Hate Speech Detection
    Kousar, Abida
    Ahmad, Jameel
    Ijaz, Khalid
    Yousef, Amr
    Ahmed Shaikh, Zaffar
    Khosa, Ikramullah
    Chavali, Durga
    Anjum, Mohd
    IEEE ACCESS, 2024, 12 : 106631 - 106644
  • [26] A comparison of text preprocessing techniques for hate and offensive speech detection in Twitter
    Anna Glazkova
    Social Network Analysis and Mining, 13
  • [27] Nationalist bias in Turkish official discourse on hate speech: a Rawlsian criticism
    Deveci, Cem
    Kinik, Burcu Nur Binbuga
    TURKISH STUDIES, 2019, 20 (01) : 26 - 48
  • [28] Violence, Hate Speech, and Gender Bias: Challenges to an Inclusive Digital Environment
    Romer-Pieretti, Max
    Esteban-Ramiro, Beatriz
    Silva, Agrivalca Canelon
    SOCIAL INCLUSION, 2025, 13
  • [29] Utilizing subjectivity level to mitigate identity term bias in toxic comments classification
    Zhao, Zhixue
    Zhang, Ziqi
    Hopfgartner, Frank
    ONLINE SOCIAL NETWORKS AND MEDIA, 2022, 29
  • [30] Topic Oriented Hate Speech Detection
    Jamil, Raihan
    Khan, Mohammad Abdullah Al Nayeem
    Anwar, Md Musfique
    HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 365 - 375