Lazy Data Practices Harm Fairness Research

被引：0

作者：

Simson, Jan ^{[1
,2
]}

Fabris, Alessandro ^{[3
]}

Kern, Christoph ^{[1
,2
,4
]}

机构：

[1] Ludwig Maximilians Univ Munchen, Munich, Germany

[2] Munich Ctr Machine Learning MCML, Munich, Germany

[3] Max Planck Inst Secur & Privacy, Bochum, Germany

[4] Univ Maryland, College Pk, MD 20742 USA

来源：

PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024 | 2024年

关键词：

critical data studies; protected groups; fair ML generalization; reproducibility; DISCRIMINATION; STEREOTYPES; ISLAMOPHOBIA; RELIGION; MERIT;

D O I：

10.1145/3630106.3658931

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data practices shape research and practice on fairness in machine learning (fair ML). Critical data studies offer important reflections and critiques for the responsible advancement of the field by high-lighting shortcomings and proposing recommendations for improvement. In this work, we present a comprehensive analysis of fair ML datasets, demonstrating how unreflective yet common practices hinder the reach and reliability of algorithmic fairness findings. We systematically study protected information encoded in tabular datasets and their usage in 280 experiments across 142 publications. Our analyses identify three main areas of concern: (1) a lack of representation for certain protected attributes in both data and evaluations; (2) the widespread exclusion of minorities during data preprocessing; and (3) opaque data processing threatening the generalization of fairness research. By conducting exemplary analyses on the utilization of prominent datasets, we demonstrate how unreflective data decisions disproportionately affect minority groups, fairness metrics, and resultant model comparisons. Additionally, we identify supplementary factors such as limitations in publicly available data, privacy considerations, and a general lack of awareness, which exacerbate these challenges. To address these issues, we propose a set of recommendations for data usage in fairness research centered on transparency and responsible inclusion. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.

引用

页码：642 / 659

页数：18

共 50 条

[21] Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance
Gil, Yolanda
David, Cedric H.
Demir, Ibrahim
Essawy, Bakinam T.
Fulweiler, Robinson W.
Goodall, Jonathan L.
Karlstrom, Leif
Lee, Huikyo
Mills, Heath J.
Oh, Ji-Hyun
Pierce, Suzanne A.
Pope, Allen
Tzeng, Mimi W.
Villamizar, Sandra R.
Yu, Xuan
EARTH AND SPACE SCIENCE, 2016, 3 (10) : 388 - 415
[22] 'Yes, i comply!': Motivations and Practices around Research Data Management and Reuse across Scientific Fields
Feger S.S.
Wozniak P.W.
Lischke L.
Schmidt A.
Proceedings of the ACM on Human-Computer Interaction, 2020, 4 (CSCW2)
[23] Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination
Kallus, Nathan
Mao, Xiaojie
Zhou, Angela
MANAGEMENT SCIENCE, 2022, 68 (03) : 1959 - 1981
[24] Best practices for weight at work research
Lemmon, Grace
Jensen, Jaclyn M.
Kuljanin, Goran
INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY, 2024, 17 (01) : 85 - 105
[25] Meta-research: Evaluation and Improvement of Research Methods and Practices
Ioannidis, John P. A.
Fanelli, Daniele
Dunne, Debbie Drake
Goodman, Steven N.
PLOS BIOLOGY, 2015, 13 (10): : 1 - 7
[26] Teaching Good Research Practices: Protocol of a Research Master Course
Sarafoglou, Alexandra
Hoogeveen, Suzanne
Matzke, Dora
Wagenmakers, Eric-Jan
PSYCHOLOGY LEARNING AND TEACHING-PLAT, 2020, 19 (01): : 46 - 59
[27] Research Data Management Commitment Drivers: An Analysis of Practices, Training, Policies, Infrastructure, and Motivation in Global Agricultural Science
Feger S.S.
Pertiwi C.
Bonaiuti E.
Proceedings of the ACM on Human-Computer Interaction, 2022, 6 (CSCW2):
[28] How collective punishment harm intergroup relations through ingroup homogeneity, perceived fairness, and counter-collective action: A registered report
Uysal, Mete Sefa
Coksan, Sami
Kessler, Thomas
POLITICAL PSYCHOLOGY, 2024,
[29] Fairness in Semi-Supervised Learning: Unlabeled Data Help to Reduce Discrimination
Zhang, Tao
Zhu, Tianqing
Li, Jing
Han, Mengde
Zhou, Wanlei
Yu, Philip
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (04) : 1763 - 1774
[30] Open science practices for eating disorders research
Burke, Natasha L.
Frank, Guido K. W.
Hilbert, Anja
Hildebrandt, Thomas
Klump, Kelly L.
Thomas, Jennifer J.
Wade, Tracey D.
Walsh, B. Timothy
Wang, Shirley B.
Weissman, Ruth Striegel
INTERNATIONAL JOURNAL OF EATING DISORDERS, 2021, 54 (10) : 1719 - 1729

← 1 2 3 4 5 →