Lazy Data Practices Harm Fairness Research

被引:0
作者
Simson, Jan [1 ,2 ]
Fabris, Alessandro [3 ]
Kern, Christoph [1 ,2 ,4 ]
机构
[1] Ludwig Maximilians Univ Munchen, Munich, Germany
[2] Munich Ctr Machine Learning MCML, Munich, Germany
[3] Max Planck Inst Secur & Privacy, Bochum, Germany
[4] Univ Maryland, College Pk, MD 20742 USA
来源
PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024 | 2024年
关键词
critical data studies; protected groups; fair ML generalization; reproducibility; DISCRIMINATION; STEREOTYPES; ISLAMOPHOBIA; RELIGION; MERIT;
D O I
10.1145/3630106.3658931
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field by high-lighting shortcomings and proposing recommendations for improvement. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a lack of representation for certain protected attributes in both data and evaluations; (2) the widespread exclusion of minorities during data preprocessing; and (3) opaque data processing threatening the generalization of fairness research. By conducting exemplary analyses on the utilization of prominent datasets, we demonstrate how unreflective data decisions disproportionately affect minority groups, fairness metrics, and resultant model comparisons. Additionally, we identify supplementary factors such as limitations in publicly available data, privacy considerations, and a general lack of awareness, which exacerbate these challenges. To address these issues, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
引用
收藏
页码:642 / 659
页数:18
相关论文
共 50 条
  • [31] Analytical code sharing practices in biomedical research
    Sharma, Nitesh Kumar
    Ayyala, Ram
    Deshpande, Dhrithi
    Patel, Yesha
    Munteanu, Viorel
    Ciorba, Dumitru
    Bostan, Viorel
    Fiscutean, Andrada
    Vahed, Mohammad
    Sarkar, Aditya
    Guo, Ruiwei
    Moore, Andrew
    Darci-Maher, Nicholas
    Nogoy, Nicole
    Abedalthaga, Malak
    Mangul, Serghei
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [32] Data Fairness to Find Biases That Influence the Algorithm's Decision Making Results
    Soares, Leticia Sakamoto
    da Silva, Leandro Augusto
    PROCEEDINGS OF THE 3RD EUROPEAN CONFERENCE ON THE IMPACT OF ARTIFICIAL INTELLIGENCE AND ROBOTICS (ECIAIR 2021), 2021, : 217 - 225
  • [33] Transparent and Reproducible Research Practices in the Surgical Literature
    Hughes, Bryan Taylor
    Niemann, Andrew
    Tritz, Daniel
    Boyer, Kryston
    Robbins, Hal
    Vassar, Matt
    JOURNAL OF SURGICAL RESEARCH, 2022, 274 : 116 - 124
  • [34] Reproducible Research Practices and Barriers to Reproducible Research in Geography: Insights from a Survey
    Kedron, Peter
    Holler, Joseph
    Bardin, Sarah
    ANNALS OF THE AMERICAN ASSOCIATION OF GEOGRAPHERS, 2024, 114 (02) : 369 - 386
  • [35] Numbers will not save us: Agonistic data practices
    Crooks, Roderic
    Currie, Morgan
    INFORMATION SOCIETY, 2021, 37 (04) : 201 - 213
  • [36] How Could Equality and Data Protection Law Shape AI Fairness for People with Disabilities?
    Binns, Reuben
    Kirkham, Reuben
    ACM TRANSACTIONS ON ACCESSIBLE COMPUTING, 2021, 14 (03)
  • [37] The use of animals for research on animal diseases: Its impact on the harm-benefit analysis
    Rickard, MD
    ATLA-ALTERNATIVES TO LABORATORY ANIMALS, 2004, 32 : 225 - 227
  • [38] Understanding experiments and research practices for reproducibility: an exploratory study
    Samuel, Sheeba
    Koenig-Ries, Birgitta
    PEERJ, 2021, 9
  • [39] Ethics and international business research: Considerations and best practices
    Miller, Stewart R.
    Moore, Fiona
    Eden, Lorraine
    INTERNATIONAL BUSINESS REVIEW, 2024, 33 (01)
  • [40] Analysis of practices to promote reproducibility and transparency in anaesthesiology research
    Okonya, Ochije
    Rorah, Drayton
    Tritz, Daniel
    Umberham, Blake
    Wiley, Matt
    Vassar, Matt
    BRITISH JOURNAL OF ANAESTHESIA, 2020, 125 (05) : 835 - 842