Citizens' data afterlives: Practices of dataset inclusion in machine learning for public welfare

被引：1

作者：

Ratner, Helene Friis ^{[1
,2
]}

Thylstrup, Nanna Bonde ^{[2
]}

机构：

[1] Aarhus Univ, Danish Sch Educ DPU, Tuborgvej 164, DK-2400 Copenhagen N, Denmark

[2] Univ Copenhagen, Dept Arts & Cultural Studies, Karen Blixensvej 1, DK-2300 Copenhagen, Denmark

来源：

AI & SOCIETY | 2024年 / 40卷 / 3期

关键词：

Machine learning; Welfare state; Data afterlives; Dataset negotiations; DATABASES; CHILD; CARE;

D O I：

10.1007/s00146-024-01920-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Public sector adoption of AI techniques in welfare systems recasts historic national data as resource for machine learning. In this paper, we examine how the use of register data for development of predictive models produces new 'afterlives' for citizen data. First, we document a Danish research project's practical efforts to develop an algorithmic decision-support model for social workers to classify children's risk of maltreatment. Second, we outline the tensions emerging from project members' negotiations about which datasets to include. Third, we identify three types of afterlives for citizen data in machine learning projects: (1) data afterlives for training and testing the algorithm, acting as 'ground truth' for inferring futures, (2) data afterlives for validating the algorithmic model, acting as markers of robustness, and (3) data afterlives for improving the model's fairness, valuated for reasons of data ethics. We conclude by discussing how, on one hand, these afterlives engender new ethical relations between state and citizens; and how they, on the other hand, also articulate an alternative view on the value of datasets, posing interesting contrasts between machine learning projects developed within the context of the Danish welfare state and mainstream corporate AI discourses of the bigger, the better.

引用

页码：1183 / 1193

页数：11

共 50 条

[21] Data-Centric Machine Learning: Improving Model Performance and Understanding Through Dataset Analysis
Westermann, Hannes
Savelka, Jaromir
Walker, Vern R.
Ashley, Kevin D.
Benyekhlef, Karim
LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 346 : 54 - 57
[22] Training data selection based on dataset distillation for rapid deployment in machine-learning workflows
Yuna Jeong
Myunggwon Hwang
Wonkyung Sung
Multimedia Tools and Applications, 2023, 82 : 9855 - 9870
[23] Machine learning approach for data analysis and predicting coronavirus using COVID-19 India dataset
Singh S.
Ramkumar K.R.
Kukkar A.
International Journal of Business Intelligence and Data Mining, 2023, 24 (01) : 47 - 73
[24] Data-Driven Insights through Industrial Retrofitting: An Anonymized Dataset with Machine Learning Use Cases
Atzeni, Daniele
Ramjattan, Reshawn
Figlie, Roberto
Baldi, Giacomo
Mazzei, Daniele
SENSORS, 2023, 23 (13)
[25] Data- and interaction-driven approaches for sustained musical practices with machine learning
Vigliensoni, Gabriel
Fiebrink, Rebecca
JOURNAL OF NEW MUSIC RESEARCH, 2025,
[26] Benchmark dataset for mid-price forecasting of limit order book data with machine learning methods
Ntakaris, Adamantios
Magris, Martin
Kanniainen, Juho
Gabbouj, Moncef
Iosifidis, Alexandros
JOURNAL OF FORECASTING, 2018, 37 (08) : 852 - 866
[27] Exploring the use of machine learning for interpreting electrochemical impedance spectroscopy data: evaluation of the training dataset size
Bongiorno, V.
Gibbon, S.
Michailidou, E.
Curioni, M.
CORROSION SCIENCE, 2022, 198
[28] Predicting Mouse Liver Microsomal Stability with “Pruned” Machine Learning Models and Public Data
Alexander L. Perryman
Thomas P. Stratton
Sean Ekins
Joel S. Freundlich
Pharmaceutical Research, 2016, 33 : 433 - 449
[29] Machine Learning Applied to Open Government Data for the Detection of Improprieties in the Application of Public Resources
Vaqueiro, Ramon Dantas
Vargas, Ana Caroline G.
Escovedo, Tatiana
Kalinowski, Marcos
PROCEEDINGS OF THE 19TH BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS, 2023, : 213 - 220
[30] Using supervised machine learning to scale human-coded data: A method and dataset in the board leadership context
Harrison, Joseph S.
Josefy, Matthew A.
Kalm, Matias
Krause, Ryan
STRATEGIC MANAGEMENT JOURNAL, 2023, 44 (07) : 1780 - 1802

← 1 2 3 4 5 →