Machine learning and the politics of synthetic data

被引:29
|
作者
Jacobsen, Benjamin N. [1 ]
机构
[1] Univ Durham, Dept Geog, South Rd, Durham DH1 3LE, England
基金
欧洲研究理事会;
关键词
Machine learning; data; algorithms; risk; ethics; variability;
D O I
10.1177/20539517221145372
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Machine-learning algorithms have become deeply embedded in contemporary society. As such, ample attention has been paid to the contents, biases, and underlying assumptions of the training datasets that many algorithmic models are trained on. Yet, what happens when algorithms are trained on data that are not real, but instead data that are 'synthetic', not referring to real persons, objects, or events? Increasingly, synthetic data are being incorporated into the training of machine-learning algorithms for use in various societal domains. There is currently little understanding, however, of the role played by and the ethicopolitical implications of synthetic training data for machine-learning algorithms. In this article, I explore the politics of synthetic data through two central aspects: first, synthetic data promise to emerge as a rich source of exposure to variability for the algorithm. Second, the paper explores how synthetic data promise to place algorithms beyond the realm of risk. I propose that an analysis of these two areas will help us better understand the ways in which machine-learning algorithms are envisioned in the light of synthetic data, but also how synthetic training data actively reconfigure the conditions of possibility for machine learning in contemporary society.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Machine Learning Based Flashover Prediction Models Using Synthetic Data and Fire ImagesMachine Learning Based Flashover Prediction Models Using Synthetic Data and Fire Images
    Yansheng Song
    Guang Xiao
    Haoran Wang
    Fire Technology, 2025, 61 (4) : 2389 - 2413
  • [32] Data Learning: Integrating Data Assimilation and Machine Learning
    Buizza, Caterina
    Casas, Cesar Quilodran
    Nadler, Philip
    Mack, Julian
    Marrone, Stefano
    Titus, Zainab
    Le Cornec, Clemence
    Heylen, Evelyn
    Dur, Tolga
    Ruiz, Luis Baca
    Heaney, Claire
    Lopez, Julio Amador Diaz
    Kumar, K. S. Sesh
    Arcucci, Rossella
    JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 58
  • [33] Effect of Balancing Data Using Synthetic Data on the Performance of Machine Learning Classifiers for Intrusion Detection in Computer Networks
    Dina, Ayesha Siddiqua
    Siddique, A. B.
    Manivannan, D.
    IEEE ACCESS, 2022, 10 : 96731 - 96747
  • [34] Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing
    Rankin, Debbie
    Black, Michaela
    Bond, Raymond
    Wallace, Jonathan
    Mulvenna, Maurice
    Epelde, Gorka
    JMIR MEDICAL INFORMATICS, 2020, 8 (07)
  • [35] Comparative analysis of machine learning and ensemble approaches for hepatitis B prediction using data mining with synthetic minority oversampling technique
    Alizargar, Azadeh
    Chang, Yang-Lang
    Tan, Tan-Hsu
    Liu, Tsung-Yu
    HEALTH AND TECHNOLOGY, 2024, 14 (01) : 109 - 118
  • [36] Molecular Machine Learning: The Future of Synthetic Chemistry?
    Pflueger, Philipp M.
    Glorius, Frank
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2020, 59 (43) : 18860 - 18865
  • [37] Machine learning for synthetic biology: Methods and applications
    Hu, Ruyun
    Zhang, Songya
    Meng, Hailin
    Yu, Han
    Zhang, Jianzhi
    Luo, Xiaozhou
    Si, Tong
    Liu, Chenli
    Qiao, Yu
    CHINESE SCIENCE BULLETIN-CHINESE, 2021, 66 (03): : 284 - 299
  • [38] Data Augmentation Using Synthetic Lesions Improves Machine Learning Detection of Microbleeds from MRI
    Momeni, Saba
    Fazllolahi, Amir
    Bourgeat, Pierrick
    Raniga, Parnesh
    Yates, Paul
    Yassi, Nawaf
    Desmond, Patricia
    Fripp, Jurgen
    Gao, Yongsheng
    Salvado, Olivier
    SIMULATION AND SYNTHESIS IN MEDICAL IMAGING, 2018, 11037 : 12 - 19
  • [39] Generating Synthetic Sensor Data to Facilitate Machine Learning Paradigm for Prediction of Building Fire Hazard
    Tam, Wai Cheong
    Fu, Eugene Yujun
    Peacock, Richard
    Reneke, Paul
    Wang, Jun
    Li, Jiajia
    Cleary, Thomas
    FIRE TECHNOLOGY, 2023, 59 (06) : 3027 - 3048
  • [40] Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniques
    de Reus, Pepijn
    Oprescu, Ana
    van Elsen, Koen
    2023 INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABILITY, ICT4S, 2023, : 57 - 65