Lazy Data Practices Harm Fairness Research

被引:0
作者
Simson, Jan [1 ,2 ]
Fabris, Alessandro [3 ]
Kern, Christoph [1 ,2 ,4 ]
机构
[1] Ludwig Maximilians Univ Munchen, Munich, Germany
[2] Munich Ctr Machine Learning MCML, Munich, Germany
[3] Max Planck Inst Secur & Privacy, Bochum, Germany
[4] Univ Maryland, College Pk, MD 20742 USA
来源
PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024 | 2024年
关键词
critical data studies; protected groups; fair ML generalization; reproducibility; DISCRIMINATION; STEREOTYPES; ISLAMOPHOBIA; RELIGION; MERIT;
D O I
10.1145/3630106.3658931
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field by high-lighting shortcomings and proposing recommendations for improvement. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a lack of representation for certain protected attributes in both data and evaluations; (2) the widespread exclusion of minorities during data preprocessing; and (3) opaque data processing threatening the generalization of fairness research. By conducting exemplary analyses on the utilization of prominent datasets, we demonstrate how unreflective data decisions disproportionately affect minority groups, fairness metrics, and resultant model comparisons. Additionally, we identify supplementary factors such as limitations in publicly available data, privacy considerations, and a general lack of awareness, which exacerbate these challenges. To address these issues, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
引用
收藏
页码:642 / 659
页数:18
相关论文
共 50 条
  • [41] Best research practices for using the Implicit Association Test
    Greenwald, Anthony G.
    Brendl, Miguel
    Cai, Huajian
    Cvencek, Dario
    Dovidio, John F.
    Friese, Malte
    Hahn, Adam
    Hehman, Eric
    Hofmann, Wilhelm
    Hughes, Sean
    Hussey, Ian
    Jordan, Christian
    Kirby, Teri A.
    Lai, Calvin K.
    Lang, Jonas W. B.
    Lindgren, Kristen P.
    Maison, Dominika
    Ostafin, Brian D.
    Rae, James R.
    Ratliff, Kate A.
    Spruyt, Adriaan
    Wiers, Reinout W.
    BEHAVIOR RESEARCH METHODS, 2022, 54 (03) : 1161 - 1180
  • [42] The case for open research in entomology: Reducing harm, refining reproducibility and advancing insect science
    Cuff, Jordan P.
    Barrett, Meghan
    Gray, Helen
    Fox, Charles
    Watt, Allan
    Aime, Emilie
    AGRICULTURAL AND FOREST ENTOMOLOGY, 2024, 26 (03) : 285 - 295
  • [43] Conceptualizing Religious Practices in Psychological Research: Problems and Prospects
    Slife, Brent D.
    Reber, Jeffrey S.
    PASTORAL PSYCHOLOGY, 2012, 61 (5-6) : 735 - 746
  • [44] Nine best practices for research software registries and repositories
    Garijo, Daniel
    Menager, Herve
    Hwang, Lorraine
    Trisovic, Ana
    Hucka, Michael
    Morrell, Thomas
    Allen, Alice
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [45] Data Practices for Studying the Impacts of Environmental Amenities and Hazards with Nationwide Property Data
    Nolte, Christoph
    Boyle, Kevin J.
    Chaudhry, Anita M.
    Clapp, Christopher
    Guignet, Dennis
    Hennighausen, Hannah
    Kushner, Ido
    Liao, Yanjun
    Mamun, Saleh
    Pollack, Adam
    Richardson, Jesse
    Sundquist, Shelby
    Swedberg, Kristen
    Uhl, Johannes H.
    LAND ECONOMICS, 2024, 100 (01) : 200 - 221
  • [46] Questionable Research Practices and Open Science in Quantitative Criminology
    Jason M. Chin
    Justin T. Pickett
    Simine Vazire
    Alex O. Holcombe
    Journal of Quantitative Criminology, 2023, 39 : 21 - 51
  • [47] Questionable Research Practices and Open Science in Quantitative Criminology
    Chin, Jason M.
    Pickett, Justin T.
    Vazire, Simine
    Holcombe, Alex O.
    JOURNAL OF QUANTITATIVE CRIMINOLOGY, 2023, 39 (01) : 21 - 51
  • [48] Equalizing Credit Opportunity in Algorithms: Aligning Algorithmic Fairness Research with US Fair Lending Regulation
    Kumar, I. Elizabeth
    Hines, Keegan E.
    Dickerson, John P.
    PROCEEDINGS OF THE 2022 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2022, 2022, : 357 - 368
  • [49] Best practices in vehicle stop data collection and analysis
    Tillyer, Rob
    Engel, Robin S.
    Cherkauskas, Jennifer Calnon
    POLICING-AN INTERNATIONAL JOURNAL OF POLICE STRATEGIES & MANAGEMENT, 2010, 33 (01) : 69 - 92
  • [50] Moving Towards FAIR Data Practices in Pharmacy Education
    McLaughlin, Jacqueline E.
    Tropsha, Alexander
    Nicolazzo, Joseph A.
    Crescenzi, Anita
    Brouwer, Kim L. R.
    AMERICAN JOURNAL OF PHARMACEUTICAL EDUCATION, 2022, 86 (03) : 163 - 166