Citizens' data afterlives: Practices of dataset inclusion in machine learning for public welfare

被引:1
|
作者
Ratner, Helene Friis [1 ,2 ]
Thylstrup, Nanna Bonde [2 ]
机构
[1] Aarhus Univ, Danish Sch Educ DPU, Tuborgvej 164, DK-2400 Copenhagen N, Denmark
[2] Univ Copenhagen, Dept Arts & Cultural Studies, Karen Blixensvej 1, DK-2300 Copenhagen, Denmark
关键词
Machine learning; Welfare state; Data afterlives; Dataset negotiations; DATABASES; CHILD; CARE;
D O I
10.1007/s00146-024-01920-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Public sector adoption of AI techniques in welfare systems recasts historic national data as resource for machine learning. In this paper, we examine how the use of register data for development of predictive models produces new 'afterlives' for citizen data. First, we document a Danish research project's practical efforts to develop an algorithmic decision-support model for social workers to classify children's risk of maltreatment. Second, we outline the tensions emerging from project members' negotiations about which datasets to include. Third, we identify three types of afterlives for citizen data in machine learning projects: (1) data afterlives for training and testing the algorithm, acting as 'ground truth' for inferring futures, (2) data afterlives for validating the algorithmic model, acting as markers of robustness, and (3) data afterlives for improving the model's fairness, valuated for reasons of data ethics. We conclude by discussing how, on one hand, these afterlives engender new ethical relations between state and citizens; and how they, on the other hand, also articulate an alternative view on the value of datasets, posing interesting contrasts between machine learning projects developed within the context of the Danish welfare state and mainstream corporate AI discourses of the bigger, the better.
引用
收藏
页码:1183 / 1193
页数:11
相关论文
共 50 条
  • [41] The effect of dataset size and the process of big data mining for investigating solar-thermal desalination by using machine learning
    Peng, Guilong
    Sun, Senshan
    Xu, Zhenwei
    Du, Juxin
    Qin, Yangjun
    Sharshir, Swellam W.
    Kandeal, A. W.
    Kabeel, A. E.
    Yang, Nuo
    INTERNATIONAL JOURNAL OF HEAT AND MASS TRANSFER, 2025, 236
  • [42] Counting Passengers in Public Buses by Sensing Carbon Dioxide Concentration: Data Collection and Machine Learning
    Li, Tengyue
    Fong, Simon
    Yang, Lili
    BDIOT 2018: PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA AND INTERNET OF THINGS, 2018, : 43 - 48
  • [43] Harmonization of supervised machine learning practices for efficient source attribution of Listeria monocytogenes based on genomic data
    Castelli, Pierluigi
    De Ruvo, Andrea
    Bucciacchio, Andrea
    D'Alterio, Nicola
    Camma, Cesare
    Di Pasquale, Adriano
    Radomski, Nicolas
    BMC GENOMICS, 2023, 24 (01)
  • [44] Harmonization of supervised machine learning practices for efficient source attribution of Listeria monocytogenes based on genomic data
    Pierluigi Castelli
    Andrea De Ruvo
    Andrea Bucciacchio
    Nicola D’Alterio
    Cesare Cammà
    Adriano Di Pasquale
    Nicolas Radomski
    BMC Genomics, 24
  • [45] Mapping Methane-The Impact of Dairy Farm Practices on Emissions Through Satellite Data and Machine Learning
    Bi, Hanqing
    Neethirajan, Suresh
    CLIMATE, 2024, 12 (12)
  • [46] Data Collection, Statistical Analysis And Machine Learning Studies Of Cancer Dataset From North Costal Districts Of AP, India
    Vital, T. Panduranga
    Raju, G. S. V. Prasada
    Rao, I. S. Siva
    Kumar, A. D. Praveen
    INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONVERGENCE (ICCC 2015), 2015, 48 : 706 - 714
  • [47] Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions
    Zantvoort, Kirsten
    Isacsson, Nils Hentati
    Funk, Burkhardt
    Kaldo, Viktor
    DIGITAL HEALTH, 2024, 10
  • [48] Reducing data requirement for accurate photovoltaic power prediction using hybrid machine learning-physical model on diverse dataset
    Syauqi, Ahmad
    Eldi, Gian Pavian
    Andika, Riezqa
    Lim, Hankwon
    SOLAR ENERGY, 2024, 279
  • [49] A new perspective for longitudinal measurement and analysis of public education in Brazil based on open data and machine learning
    Silva, Matheus
    Ferreira, Abilio
    Alves, Karine
    Valenca, George
    Brito, Kellyton
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON THEORY AND PRACTICE OF ELECTRONIC GOVERNANCE, ICEGOV 2024, 2024, : 130 - 138
  • [50] Medical Image Data and Datasets in the Era of Machine Learning—Whitepaper from the 2016 C-MIMI Meeting Dataset Session
    Marc D. Kohli
    Ronald M. Summers
    J. Raymond Geis
    Journal of Digital Imaging, 2017, 30 : 392 - 399