Citizens' data afterlives: Practices of dataset inclusion in machine learning for public welfare

被引:1
|
作者
Ratner, Helene Friis [1 ,2 ]
Thylstrup, Nanna Bonde [2 ]
机构
[1] Aarhus Univ, Danish Sch Educ DPU, Tuborgvej 164, DK-2400 Copenhagen N, Denmark
[2] Univ Copenhagen, Dept Arts & Cultural Studies, Karen Blixensvej 1, DK-2300 Copenhagen, Denmark
关键词
Machine learning; Welfare state; Data afterlives; Dataset negotiations; DATABASES; CHILD; CARE;
D O I
10.1007/s00146-024-01920-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Public sector adoption of AI techniques in welfare systems recasts historic national data as resource for machine learning. In this paper, we examine how the use of register data for development of predictive models produces new 'afterlives' for citizen data. First, we document a Danish research project's practical efforts to develop an algorithmic decision-support model for social workers to classify children's risk of maltreatment. Second, we outline the tensions emerging from project members' negotiations about which datasets to include. Third, we identify three types of afterlives for citizen data in machine learning projects: (1) data afterlives for training and testing the algorithm, acting as 'ground truth' for inferring futures, (2) data afterlives for validating the algorithmic model, acting as markers of robustness, and (3) data afterlives for improving the model's fairness, valuated for reasons of data ethics. We conclude by discussing how, on one hand, these afterlives engender new ethical relations between state and citizens; and how they, on the other hand, also articulate an alternative view on the value of datasets, posing interesting contrasts between machine learning projects developed within the context of the Danish welfare state and mainstream corporate AI discourses of the bigger, the better.
引用
收藏
页码:1183 / 1193
页数:11
相关论文
共 50 条
  • [31] Machine Learning Model Generation With Copula-Based Synthetic Dataset for Local Differentially Private Numerical Data
    Sei, Yuichi
    Onesimu, J. Andrew
    Ohsuga, Akihiko
    IEEE ACCESS, 2022, 10 : 101656 - 101671
  • [32] Machine learning analysis and inference of student performance and visualization of data results based on a small dataset of student information
    Li, Haoyang
    Li, Wenxuan
    Zhang, Zihao
    Yuan, Haobo
    Wan, Yunxiang
    2021 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING, BIG DATA AND BUSINESS INTELLIGENCE (MLBDBI 2021), 2021, : 117 - 122
  • [33] Analysis of public opinion on food safety in Greater China with big data and machine learning
    Zhang, Haoyang
    Zhang, Dachuan
    Wei, Zhisheng
    Li, Yan
    Wu, Shaji
    Mao, Zhiheng
    He, Chunmeng
    Ma, Haorui
    Zeng, Xin
    Xie, Xiaoling
    Kou, Xingran
    Zhang, Bingwen
    CURRENT RESEARCH IN FOOD SCIENCE, 2023, 6
  • [34] Smart Grid Security Framework for Data Transmissions with Adaptive Practices Using Machine Learning Algorithm
    Selvarajan, Shitharth
    Manoharan, Hariprasath
    Al-Shehari, Taher
    Alsalman, Hussain
    Alfakih, Taha
    CMC-COMPUTERS MATERIALS & CONTINUA, 2025, 82 (03): : 4339 - 4369
  • [35] Annotations as Knowledge Practices in Image Archives: Application of Linked Open Usable Data and Machine Learning
    Cornut, Murielle
    Raemy, Julien Antoine
    Spiess, Florian
    ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 2023, 16 (04):
  • [36] Predicting Mouse Liver Microsomal Stability with "Pruned" Machine Learning Models and Public Data
    Perryman, Alexander L.
    Stratton, Thomas P.
    Ekins, Sean
    Freundlich, Joel S.
    PHARMACEUTICAL RESEARCH, 2016, 33 (02) : 433 - 449
  • [37] A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets
    Bengesi, Staphord
    Oladunni, Timothy
    Olusegun, Ruth
    Audu, Halima
    IEEE ACCESS, 2023, 11 : 11811 - 11826
  • [38] Updating "machine learning imagery dataset for maize crop: A case of Tanzania" with expanded data to cover the new farming season
    Mduma, Neema
    Mayo, Flavia
    DATA IN BRIEF, 2024, 54
  • [39] Building Visual Malware Dataset using VirusShare Data and Comparing Machine Learning Baseline Model to CoAtNet for Malware Classification
    Bruzzese, Roberto R.
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 185 - 193
  • [40] Next Day Wildfire Spread: A Machine Learning Dataset to Predict Wildfire Spreading From Remote-Sensing Data
    Huot, Fantine
    Hu, R. Lily
    Goyal, Nita
    Sankar, Tharun
    Ihme, Matthias
    Chen, Yi-Fan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60