Machine learning and the politics of synthetic data

被引:29
|
作者
Jacobsen, Benjamin N. [1 ]
机构
[1] Univ Durham, Dept Geog, South Rd, Durham DH1 3LE, England
基金
欧洲研究理事会;
关键词
Machine learning; data; algorithms; risk; ethics; variability;
D O I
10.1177/20539517221145372
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Machine-learning algorithms have become deeply embedded in contemporary society. As such, ample attention has been paid to the contents, biases, and underlying assumptions of the training datasets that many algorithmic models are trained on. Yet, what happens when algorithms are trained on data that are not real, but instead data that are 'synthetic', not referring to real persons, objects, or events? Increasingly, synthetic data are being incorporated into the training of machine-learning algorithms for use in various societal domains. There is currently little understanding, however, of the role played by and the ethicopolitical implications of synthetic training data for machine-learning algorithms. In this article, I explore the politics of synthetic data through two central aspects: first, synthetic data promise to emerge as a rich source of exposure to variability for the algorithm. Second, the paper explores how synthetic data promise to place algorithms beyond the realm of risk. I propose that an analysis of these two areas will help us better understand the ways in which machine-learning algorithms are envisioned in the light of synthetic data, but also how synthetic training data actively reconfigure the conditions of possibility for machine learning in contemporary society.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] A survey on missing data in machine learning
    Emmanuel, Tlamelo
    Maupong, Thabiso
    Mpoeleng, Dimane
    Semong, Thabo
    Mphago, Banyatsang
    Tabona, Oteng
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [22] Application of machine learning in ocean data
    Lou, Ranran
    Lv, Zhihan
    Dang, Shuping
    Su, Tianyun
    Li, Xinfang
    MULTIMEDIA SYSTEMS, 2023, 29 (03) : 1815 - 1824
  • [23] Machine Learning-Aided Synthetic Air Data System for Commercial Aircraft
    Kilic, Ugur
    Cam, Omer
    Can, Erol
    JOURNAL OF AEROSPACE ENGINEERING, 2024, 37 (06)
  • [24] Data Science: Machine Learning and Multivariate Analysis in Learning Styles
    Maiquez, Diego
    Pabon, Diego
    Condor, Mariela
    Rodriguez, Gonzalo
    Farinango, Mauricio
    Oyasa, Ana
    INNOVATION AND RESEARCH-SMART TECHNOLOGIES & SYSTEMS, VOL 2, CI3 2023, 2024, 1041 : 69 - 81
  • [25] Dynamics Modeling of Industrial Robotic Manipulators: A Machine Learning Approach Based on Synthetic Data
    Segota, Sandi Baressi
    Andelic, Nikola
    Sercer, Mario
    Mestric, Hrvoje
    MATHEMATICS, 2022, 10 (07)
  • [26] Machine Learning Based Flashover Prediction Models Using Synthetic Data and Fire Images
    Song, Yansheng
    Xiao, Guang
    Wang, Haoran
    FIRE TECHNOLOGY, 2025,
  • [27] Comparison of Machine Learning Approaches for Reconstructing Sea Subsurface Salinity Using Synthetic Data
    Tian, Tian
    Leng, Hongze
    Wang, Gongjie
    Li, Guancheng
    Song, Junqiang
    Zhu, Jiang
    An, Yuzhu
    REMOTE SENSING, 2022, 14 (22)
  • [28] Randomization and Entropy in Machine Learning and Data Processing
    Popkov, Yu S.
    DOKLADY MATHEMATICS, 2022, 105 (03) : 135 - 157
  • [29] Combining Synthetic and Observed Data to Enhance Machine Learning Model Performance for Streamflow Prediction
    Lopez-Chacon, Sergio Ricardo
    Salazar, Fernando
    Blade, Ernest
    WATER, 2023, 15 (11)
  • [30] A survey of machine learning for big data processing
    Qiu, Junfei
    Wu, Qihui
    Ding, Guoru
    Xu, Yuhua
    Feng, Shuo
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016,