A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers

被引:26
作者
Chen, Zhenpeng [1 ]
Zhang, Jie M. [2 ]
Sarro, Federica [1 ]
Harman, Mark [1 ]
机构
[1] UCL, Dept Comp Sci, Gower St, London WC1E 6BT, England
[2] Kings Coll London, Dept Informat, 30 Aldwych, London WC2B 4BG, England
关键词
Machine Learning; bias mitigation; fairness-performance trade-off;
D O I
10.1145/3583561
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software bias is an increasingly important operational concern for software engineers. We present a large-scale, comprehensive empirical study of 17 representative bias mitigation methods for Machine Learning (ML) classifiers, evaluated with 11ML performance metrics (e.g., accuracy), 4 fairness metrics, and 20 types of fairness-performance tradeoff assessment, applied to 8 widely-adopted software decision tasks. The empirical coverage is much more comprehensive, covering the largest numbers of bias mitigation methods, evaluation metrics, and fairness-performance tradeoff measures compared to previous work on this important software property. We find that (1) the bias mitigation methods significantly decrease ML performance in 53% of the studied scenarios (ranging between 42%similar to 66% according to different ML performance metrics); (2) the bias mitigation methods significantly improve fairness measured by the 4 used metrics in 46% of all the scenarios (ranging between 24%similar to 59% according to different fairness metrics); (3) the bias mitigation methods even lead to decrease in both fairness and ML performance in 25% of the scenarios; (4) the effectiveness of the bias mitigation methods depends on tasks, models, the choice of protected attributes, and the set of metrics used to assess fairness and ML performance; (5) there is no bias mitigation method that can achieve the best tradeoff in all the scenarios. The best method that we find outperforms other methods in 30% of the scenarios. Researchers and practitioners need to choose the bias mitigation method best suited to their intended application scenario(s).
引用
收藏
页数:30
相关论文
共 78 条
  • [1] Black Box Fairness Testing of Machine Learning Models
    Aggarwal, Aniya
    Lohia, Pranay
    Nagar, Seema
    Dey, Kuntal
    Saha, Diptikalyan
    [J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 625 - 635
  • [2] aif360, ABOUT US
  • [3] Alabi Kayode Omotosho, 2020, Communications in Computer and Information Science, P158
  • [4] [Anonymous], ABOUT US
  • [5] A Practical Guide for Using Statistical Tests to Assess Randomized Algorithms in Software Engineering
    Arcuri, Andrea
    Briand, Lionel
    [J]. 2011 33RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2011, : 1 - 10
  • [6] Fairness in Criminal Justice Risk Assessments: The State of the Art
    Berk, Richard
    Heidari, Hoda
    Jabbari, Shahin
    Kearns, Michael
    Roth, Aaron
    [J]. SOCIOLOGICAL METHODS & RESEARCH, 2021, 50 (01) : 3 - 44
  • [7] Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline
    Biswas, Sumon
    Rajan, Hridesh
    [J]. PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 981 - 993
  • [8] Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness
    Biswas, Sumon
    Rajan, Hridesh
    [J]. PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20), 2020, : 642 - 653
  • [9] Briand L., 1996, Empirical Software Engineering, V1, P61, DOI 10.1007/BF00125812
  • [10] Software Fairness
    Brun, Yuriy
    Meliou, Alexandra
    [J]. ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, : 754 - 759