Machine learning and the politics of synthetic data

被引:29
|
作者
Jacobsen, Benjamin N. [1 ]
机构
[1] Univ Durham, Dept Geog, South Rd, Durham DH1 3LE, England
基金
欧洲研究理事会;
关键词
Machine learning; data; algorithms; risk; ethics; variability;
D O I
10.1177/20539517221145372
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Machine-learning algorithms have become deeply embedded in contemporary society. As such, ample attention has been paid to the contents, biases, and underlying assumptions of the training datasets that many algorithmic models are trained on. Yet, what happens when algorithms are trained on data that are not real, but instead data that are 'synthetic', not referring to real persons, objects, or events? Increasingly, synthetic data are being incorporated into the training of machine-learning algorithms for use in various societal domains. There is currently little understanding, however, of the role played by and the ethicopolitical implications of synthetic training data for machine-learning algorithms. In this article, I explore the politics of synthetic data through two central aspects: first, synthetic data promise to emerge as a rich source of exposure to variability for the algorithm. Second, the paper explores how synthetic data promise to place algorithms beyond the realm of risk. I propose that an analysis of these two areas will help us better understand the ways in which machine-learning algorithms are envisioned in the light of synthetic data, but also how synthetic training data actively reconfigure the conditions of possibility for machine learning in contemporary society.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Machine Learning, Synthetic Data, and the Politics of Difference
    Jacobsen, Benjamin N.
    THEORY CULTURE & SOCIETY, 2025,
  • [2] Politics of data reuse in machine learning systems: Theorizing reuse entanglements
    Thylstrup, Nanna Bonde
    Hansen, Kristian Bondo
    Flyverbom, Mikkel
    Amoore, Louise
    BIG DATA & SOCIETY, 2022, 9 (02)
  • [3] A Survey of Synthetic Data Generation for Machine Learning
    Abufadda, Mohammad
    Mansour, Khalid
    2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 488 - 494
  • [4] Synthetic satellite telemetry data for machine learning
    Schefels, Clemens
    Schlag, Leonard
    Helmsauer, Kathrin
    CEAS SPACE JOURNAL, 2025,
  • [5] Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction
    Wah, Yap Bee
    Ismail, Azlan
    Azid, Nur Niswah Naslina
    Jaafar, Jafreezal
    Aziz, Izzatdin Abdul
    Hasan, Mohd Hilmi
    Zain, Jasni Mohamad
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4821 - 4841
  • [6] On the Utility of Synthetic Data: An Empirical Evaluation on Machine Learning Tasks
    Hittmeir, Markus
    Ekelhart, Andreas
    Mayer, Rudolf
    14TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY (ARES 2019), 2019,
  • [7] The ethics and politics of data sets in the age of machine learning: deleting traces and encountering remains
    Thylstrup, Nanna Bonde
    MEDIA CULTURE & SOCIETY, 2022, 44 (04) : 655 - 671
  • [8] Nursing Orientation to Data Science and Machine Learning
    O'Brien, Roxanne L.
    O'Brien, Matt W.
    AMERICAN JOURNAL OF NURSING, 2021, 121 (04) : 32 - 39
  • [9] Predicting doxorubicin-induced cardiotoxicity in breast cancer: leveraging machine learning with synthetic data
    Araujo, Daniella Castro
    Simoes, Ricardo
    Sabino, Adriano de Paula
    Oliveira, Angelica Navarro de
    de Oliveira, Camila Maciel
    Veloso, Adriano Alonso
    Gomes, Karina Braga
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2025, : 1535 - 1550
  • [10] Using Imbalanced Triangle Synthetic Data for Machine Learning Anomaly Detection
    Luo, Menghua
    Wang, Ke
    Cai, Zhiping
    Liu, Anfeng
    Li, Yangyang
    Cheang, Chak Fong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 58 (01): : 15 - 26