Diversity-aware fairness testing of machine learning classifiers through hashing-based sampling

被引:1
|
作者
Zhao, Zhenjiang [1 ]
Toda, Takahisa [1 ]
Kitamura, Takashi [2 ]
机构
[1] Univ Electrocommun, Grad Sch Informat & Engn, Tokyo, Japan
[2] Natl Inst Adv Ind Sci & Technol, Tokyo, Japan
关键词
Algorithm fairness; Fairness testing; SAT/SMT solving; Constraint sampling; Hashing-based technique;
D O I
10.1016/j.infsof.2023.107390
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: There are growing concerns about algorithmic fairness, as some machine learning (ML)-based algorithms have been found to exhibit biases against protected attributes such as gender, race, age and so on. Individual fairness requires an ML classifier to produce similar outputs for similar individuals. Verification Based Testing (VET) is a state-of-the-art black-box testing algorithm for individual fairness that leverages constraint solving to generate test cases. Objective: Generating diverse test cases is expected to facilitate efficient detection of diverse discriminatory data instances (i. e., cases that violate individual fairness). Hashing-based sampling techniques draw a sample approximately uniformly at random from the set of solutions of given Boolean constraints. We propose VET-X, which improves VET with hashing-based sampling, aiming to improve its testing performance. Method: We realize hashing-based sampling for VET. The challenge is that the off-the-shelf hashing-based sampling techniques cannot be integrated in a straightforward manner because the constraints in VET are generally not Boolean. Moreover, we propose several enhancement techniques to make VET-X more efficient. Results: To evaluate our method, we conduct experiments, where VET-X is compared to VET, SG and ExpGA (other well-known fairness testing algorithms) over a set of configurations consisting of several datasets, protected attributes, and ML classifiers. The results show that, with each configuration, VET-X detects more discriminatory data instances with higher diversity than VET and SG. VET-X detects discriminatory data instances with higher diversity than ExpGA, though the number of discriminatory data instances detected by VET-X is lesser than ExpGA. Conclusion: Our proposed method performs better than other state-of-the-art black-box fairness testing algorithms, particularly in terms of diversity. Our method can serve to efficiently identify flaws in ML classifiers with respect to individual fairness for subsequent improvements of an ML classifier. On the other hand, although our method is specific to individual fairness, it could work for testing other aspects of a software system such as security and counterfactual explanations with some technical adaptations, which remains for future work.
引用
收藏
页数:19
相关论文
共 4 条
  • [1] Automatic Fairness Testing of Neural Classifiers Through Adversarial Sampling
    Zhang, Peixin
    Wang, Jingyi
    Sun, Jun
    Wang, Xinyu
    Dong, Guoliang
    Wang, Xingen
    Dai, Ting
    Dong, Jin Song
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (09) : 3593 - 3612
  • [2] Efficient Fairness Testing Through Hash-Based Sampling
    Zhao, Zhenjiang
    Toda, Takahisa
    Kitamura, Takashi
    SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2022, 2022, 13711 : 35 - 50
  • [3] Search-based fairness testing for regression-based machine learning systems
    Perera, Anjana
    Aleti, Aldeida
    Tantithamthavorn, Chakkrit
    Jiarpakdee, Jirayus
    Turhan, Burak
    Kuhn, Lisa
    Walker, Katie
    EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (03)
  • [4] Search-based fairness testing for regression-based machine learning systems
    Anjana Perera
    Aldeida Aleti
    Chakkrit Tantithamthavorn
    Jirayus Jiarpakdee
    Burak Turhan
    Lisa Kuhn
    Katie Walker
    Empirical Software Engineering, 2022, 27