Citizens' data afterlives: Practices of dataset inclusion in machine learning for public welfare

被引:1
|
作者
Ratner, Helene Friis [1 ,2 ]
Thylstrup, Nanna Bonde [2 ]
机构
[1] Aarhus Univ, Danish Sch Educ DPU, Tuborgvej 164, DK-2400 Copenhagen N, Denmark
[2] Univ Copenhagen, Dept Arts & Cultural Studies, Karen Blixensvej 1, DK-2300 Copenhagen, Denmark
关键词
Machine learning; Welfare state; Data afterlives; Dataset negotiations; DATABASES; CHILD; CARE;
D O I
10.1007/s00146-024-01920-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Public sector adoption of AI techniques in welfare systems recasts historic national data as resource for machine learning. In this paper, we examine how the use of register data for development of predictive models produces new 'afterlives' for citizen data. First, we document a Danish research project's practical efforts to develop an algorithmic decision-support model for social workers to classify children's risk of maltreatment. Second, we outline the tensions emerging from project members' negotiations about which datasets to include. Third, we identify three types of afterlives for citizen data in machine learning projects: (1) data afterlives for training and testing the algorithm, acting as 'ground truth' for inferring futures, (2) data afterlives for validating the algorithmic model, acting as markers of robustness, and (3) data afterlives for improving the model's fairness, valuated for reasons of data ethics. We conclude by discussing how, on one hand, these afterlives engender new ethical relations between state and citizens; and how they, on the other hand, also articulate an alternative view on the value of datasets, posing interesting contrasts between machine learning projects developed within the context of the Danish welfare state and mainstream corporate AI discourses of the bigger, the better.
引用
收藏
页码:1183 / 1193
页数:11
相关论文
共 50 条
  • [21] Data-Centric Machine Learning: Improving Model Performance and Understanding Through Dataset Analysis
    Westermann, Hannes
    Savelka, Jaromir
    Walker, Vern R.
    Ashley, Kevin D.
    Benyekhlef, Karim
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 346 : 54 - 57
  • [22] Training data selection based on dataset distillation for rapid deployment in machine-learning workflows
    Yuna Jeong
    Myunggwon Hwang
    Wonkyung Sung
    Multimedia Tools and Applications, 2023, 82 : 9855 - 9870
  • [23] Machine learning approach for data analysis and predicting coronavirus using COVID-19 India dataset
    Singh S.
    Ramkumar K.R.
    Kukkar A.
    International Journal of Business Intelligence and Data Mining, 2023, 24 (01) : 47 - 73
  • [24] Data-Driven Insights through Industrial Retrofitting: An Anonymized Dataset with Machine Learning Use Cases
    Atzeni, Daniele
    Ramjattan, Reshawn
    Figlie, Roberto
    Baldi, Giacomo
    Mazzei, Daniele
    SENSORS, 2023, 23 (13)
  • [25] Data- and interaction-driven approaches for sustained musical practices with machine learning
    Vigliensoni, Gabriel
    Fiebrink, Rebecca
    JOURNAL OF NEW MUSIC RESEARCH, 2025,
  • [26] Benchmark dataset for mid-price forecasting of limit order book data with machine learning methods
    Ntakaris, Adamantios
    Magris, Martin
    Kanniainen, Juho
    Gabbouj, Moncef
    Iosifidis, Alexandros
    JOURNAL OF FORECASTING, 2018, 37 (08) : 852 - 866
  • [27] Exploring the use of machine learning for interpreting electrochemical impedance spectroscopy data: evaluation of the training dataset size
    Bongiorno, V.
    Gibbon, S.
    Michailidou, E.
    Curioni, M.
    CORROSION SCIENCE, 2022, 198
  • [28] Predicting Mouse Liver Microsomal Stability with “Pruned” Machine Learning Models and Public Data
    Alexander L. Perryman
    Thomas P. Stratton
    Sean Ekins
    Joel S. Freundlich
    Pharmaceutical Research, 2016, 33 : 433 - 449
  • [29] Machine Learning Applied to Open Government Data for the Detection of Improprieties in the Application of Public Resources
    Vaqueiro, Ramon Dantas
    Vargas, Ana Caroline G.
    Escovedo, Tatiana
    Kalinowski, Marcos
    PROCEEDINGS OF THE 19TH BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS, 2023, : 213 - 220
  • [30] Using supervised machine learning to scale human-coded data: A method and dataset in the board leadership context
    Harrison, Joseph S.
    Josefy, Matthew A.
    Kalm, Matias
    Krause, Ryan
    STRATEGIC MANAGEMENT JOURNAL, 2023, 44 (07) : 1780 - 1802