Lazy Data Practices Harm Fairness Research

被引:0
|
作者
Simson, Jan [1 ,2 ]
Fabris, Alessandro [3 ]
Kern, Christoph [1 ,2 ,4 ]
机构
[1] Ludwig Maximilians Univ Munchen, Munich, Germany
[2] Munich Ctr Machine Learning MCML, Munich, Germany
[3] Max Planck Inst Secur & Privacy, Bochum, Germany
[4] Univ Maryland, College Pk, MD 20742 USA
来源
PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024 | 2024年
关键词
critical data studies; protected groups; fair ML generalization; reproducibility; DISCRIMINATION; STEREOTYPES; ISLAMOPHOBIA; RELIGION; MERIT;
D O I
10.1145/3630106.3658931
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field by high-lighting shortcomings and proposing recommendations for improvement. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a lack of representation for certain protected attributes in both data and evaluations; (2) the widespread exclusion of minorities during data preprocessing; and (3) opaque data processing threatening the generalization of fairness research. By conducting exemplary analyses on the utilization of prominent datasets, we demonstrate how unreflective data decisions disproportionately affect minority groups, fairness metrics, and resultant model comparisons. Additionally, we identify supplementary factors such as limitations in publicly available data, privacy considerations, and a general lack of awareness, which exacerbate these challenges. To address these issues, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
引用
收藏
页码:642 / 659
页数:18
相关论文
共 50 条
  • [21] Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance
    Gil, Yolanda
    David, Cedric H.
    Demir, Ibrahim
    Essawy, Bakinam T.
    Fulweiler, Robinson W.
    Goodall, Jonathan L.
    Karlstrom, Leif
    Lee, Huikyo
    Mills, Heath J.
    Oh, Ji-Hyun
    Pierce, Suzanne A.
    Pope, Allen
    Tzeng, Mimi W.
    Villamizar, Sandra R.
    Yu, Xuan
    EARTH AND SPACE SCIENCE, 2016, 3 (10) : 388 - 415
  • [22] 'Yes, i comply!': Motivations and Practices around Research Data Management and Reuse across Scientific Fields
    Feger S.S.
    Wozniak P.W.
    Lischke L.
    Schmidt A.
    Proceedings of the ACM on Human-Computer Interaction, 2020, 4 (CSCW2)
  • [23] Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination
    Kallus, Nathan
    Mao, Xiaojie
    Zhou, Angela
    MANAGEMENT SCIENCE, 2022, 68 (03) : 1959 - 1981
  • [24] Best practices for weight at work research
    Lemmon, Grace
    Jensen, Jaclyn M.
    Kuljanin, Goran
    INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY, 2024, 17 (01) : 85 - 105
  • [25] Meta-research: Evaluation and Improvement of Research Methods and Practices
    Ioannidis, John P. A.
    Fanelli, Daniele
    Dunne, Debbie Drake
    Goodman, Steven N.
    PLOS BIOLOGY, 2015, 13 (10): : 1 - 7
  • [26] Teaching Good Research Practices: Protocol of a Research Master Course
    Sarafoglou, Alexandra
    Hoogeveen, Suzanne
    Matzke, Dora
    Wagenmakers, Eric-Jan
    PSYCHOLOGY LEARNING AND TEACHING-PLAT, 2020, 19 (01): : 46 - 59
  • [27] Research Data Management Commitment Drivers: An Analysis of Practices, Training, Policies, Infrastructure, and Motivation in Global Agricultural Science
    Feger S.S.
    Pertiwi C.
    Bonaiuti E.
    Proceedings of the ACM on Human-Computer Interaction, 2022, 6 (CSCW2):
  • [28] How collective punishment harm intergroup relations through ingroup homogeneity, perceived fairness, and counter-collective action: A registered report
    Uysal, Mete Sefa
    Coksan, Sami
    Kessler, Thomas
    POLITICAL PSYCHOLOGY, 2024,
  • [29] Fairness in Semi-Supervised Learning: Unlabeled Data Help to Reduce Discrimination
    Zhang, Tao
    Zhu, Tianqing
    Li, Jing
    Han, Mengde
    Zhou, Wanlei
    Yu, Philip
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (04) : 1763 - 1774
  • [30] Open science practices for eating disorders research
    Burke, Natasha L.
    Frank, Guido K. W.
    Hilbert, Anja
    Hildebrandt, Thomas
    Klump, Kelly L.
    Thomas, Jennifer J.
    Wade, Tracey D.
    Walsh, B. Timothy
    Wang, Shirley B.
    Weissman, Ruth Striegel
    INTERNATIONAL JOURNAL OF EATING DISORDERS, 2021, 54 (10) : 1719 - 1729