FairShades: Fairness Auditing via Explainability in Abusive Language Detection Systems

被引:3
作者
Manerba, Marta Marchiori [1 ]
Guidotti, Riccardo [1 ]
机构
[1] Univ Pisa, Pisa, Italy
来源
2021 IEEE THIRD INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2021) | 2021年
关键词
D O I
10.1109/CogMI52975.2021.00014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At every stage of a supervised learning process, harmful biases can arise and be inadvertently introduced, ultimately leading to marginalization, discrimination, and abuse towards minorities. This phenomenon becomes particularly impactful in the sensitive real-world context of abusive language detection systems, where non-discrimination is difficult to assess. In addition, given the opaqueness of their internal behavior, the dynamics leading a model to a certain decision are often not clear nor accountable, and significant problems of trust could emerge. A robust value-oriented evaluation of models' fairness is therefore necessary. In this paper, we present FairShades, a model-agnostic approach for auditing the outcomes of abusive language detection systems. Combining explainability and fairness evaluation, FairShades can identify unintended biases and sensitive categories towards which models are most discriminative. This objective is pursued through the auditing of meaningful counterfactuals generated within CheckList framework. We conduct several experiments on BERT-based models to demonstrate our proposal's novelty and effectiveness for unmasking biases.
引用
收藏
页码:34 / 43
页数:10
相关论文
共 54 条
[1]   Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) [J].
Adadi, Amina ;
Berrada, Mohammed .
IEEE ACCESS, 2018, 6 :52138-52160
[2]  
Aluru S. S., 2020, Deep learning models for multilingual hate speech detection
[3]  
Artelt A., 2021, CONTRASTIVE EXPLANAT, V12861, P101
[4]   Reverse Engineering the Neural Networks for Rule Extraction in Classification Problems [J].
Augasta, M. Gethsiyal ;
Kathirvalavakumar, T. .
NEURAL PROCESSING LETTERS, 2012, 35 (02) :131-150
[5]  
Barbieri F., 2020, TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification, P1644, DOI [DOI 10.18653/V1/2020.FINDINGS-EMNLP.148, 10.18653/v1/2020.findings-emnlp.148]
[6]  
Basile V., 2019, P 13 INT WORKSHOP SE, P54, DOI [DOI 10.18653/V1/S19-2007, 10.18653/v1/S19-2007]
[7]  
Bender Emily M., 2018, Transactions of the Association for Computational Linguistics, V6, P587, DOI 10.1162/tacl_a_00041
[8]   Optimal classification trees [J].
Bertsimas, Dimitris ;
Dunn, Jack .
MACHINE LEARNING, 2017, 106 (07) :1039-1082
[9]  
Bird Sarah, 2020, MSR-TR- 2020-32
[10]  
Blodgett Su Lin, 2020, P 58 ANN M ASS COMPU