Lazy Data Practices Harm Fairness Research

被引:0
|
作者
Simson, Jan [1 ,2 ]
Fabris, Alessandro [3 ]
Kern, Christoph [1 ,2 ,4 ]
机构
[1] Ludwig Maximilians Univ Munchen, Munich, Germany
[2] Munich Ctr Machine Learning MCML, Munich, Germany
[3] Max Planck Inst Secur & Privacy, Bochum, Germany
[4] Univ Maryland, College Pk, MD 20742 USA
来源
PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024 | 2024年
关键词
critical data studies; protected groups; fair ML generalization; reproducibility; DISCRIMINATION; STEREOTYPES; ISLAMOPHOBIA; RELIGION; MERIT;
D O I
10.1145/3630106.3658931
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field by high-lighting shortcomings and proposing recommendations for improvement. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a lack of representation for certain protected attributes in both data and evaluations; (2) the widespread exclusion of minorities during data preprocessing; and (3) opaque data processing threatening the generalization of fairness research. By conducting exemplary analyses on the utilization of prominent datasets, we demonstrate how unreflective data decisions disproportionately affect minority groups, fairness metrics, and resultant model comparisons. Additionally, we identify supplementary factors such as limitations in publicly available data, privacy considerations, and a general lack of awareness, which exacerbate these challenges. To address these issues, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
引用
收藏
页码:642 / 659
页数:18
相关论文
共 50 条
  • [1] Algorithmic indirect discrimination, fairness and harm
    Frej Klem Thomsen
    AI and Ethics, 2024, 4 (4): : 1023 - 1037
  • [2] Policy advice and best practices on bias and fairness in AI
    Alvarez, Jose M.
    Colmenarejo, Alejandra Bringas
    Elobaid, Alaa
    Fabbrizzi, Simone
    Fahimi, Miriam
    Ferrara, Antonio
    Ghodsi, Siamak
    Mougan, Carlos
    Papageorgiou, Ioanna
    Reyero, Paula
    Russo, Mayra
    Scott, Kristen M.
    State, Laura
    Zhao, Xuan
    Ruggieri, Salvatore
    ETHICS AND INFORMATION TECHNOLOGY, 2024, 26 (02)
  • [3] Best practices for data management and sharing in experimental biomedical research
    Cunha-Oliveira, Teresa
    Ioannidis, John P. A.
    Oliveira, Paulo J.
    PHYSIOLOGICAL REVIEWS, 2024, 104 (03) : 1387 - 1408
  • [4] Reflections on Data Sharing Practices in Spinal Cord Injury Research
    John C. Gensel
    Michael B. Orr
    Neuroinformatics, 2022, 20 : 3 - 6
  • [5] Reflections on Data Sharing Practices in Spinal Cord Injury Research
    Gensel, John C.
    Orr, Michael B.
    NEUROINFORMATICS, 2022, 20 (01) : 3 - 6
  • [6] Big Data, Exploratory Data Analyses and Questionable Research Practices: Suggestion for a Foundational Principle
    Bissonette, J. A.
    WILDLIFE SOCIETY BULLETIN, 2021, 45 (03): : 366 - 370
  • [7] Expanding the research data management service portfolio at bielefeld university according to the three-pillar principle towards data FAIRness
    Schirrwagen J.
    Cimiano P.
    Ayer V.
    Pietsch C.
    Wiljes C.
    Vompras J.
    Pieper D.
    Data Science Journal, 2019, 18 (01)
  • [8] Between Care and Control: Examining Surveillance Practices in Harm Reduction
    Michaud, Liam
    van der Meulen, Emily
    Guta, Adrian
    CONTEMPORARY DRUG PROBLEMS, 2023, 50 (01) : 3 - 24
  • [9] Considerations on Fairness-aware Data Mining
    Kamishima, Toshihiro
    Akaho, Shotaro
    Asoh, Hideki
    Sakuma, Jun
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 378 - 385
  • [10] Insurance, Big Data and Changing Conceptions of Fairness
    Barry, Laurence
    ARCHIVES EUROPEENNES DE SOCIOLOGIE, 2020, 61 (02): : 159 - 184