Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture

被引:0
|
作者
Klein, Jonathan [1 ]
Waller, Rebekah [2 ]
Pirk, Soeren [3 ]
Palubicki, Wojtek [4 ]
Tester, Mark [2 ]
Michels, Dominik L. [1 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Computat Sci Grp, Thuwal, Saudi Arabia
[2] King Abdullah Univ Sci & Technol KAUST, Ctr Desert Agr, Thuwal, Saudi Arabia
[3] Christian Albrechts Univ Kiel, Inst Comp Sci, Kiel, Germany
[4] Adam Mickiewicz Univ, Fac Math & Comp Sci, Poznan, Poland
来源
FRONTIERS IN PLANT SCIENCE | 2024年 / 15卷
关键词
artificial intelligence; data generation and annotation; disease detection; greenhouse farming; machine learning; synthetic data; tomato plants;
D O I
10.3389/fpls.2024.1360113
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Machine Learning Based Flashover Prediction Models Using Synthetic Data and Fire Images
    Song, Yansheng
    Xiao, Guang
    Wang, Haoran
    FIRE TECHNOLOGY, 2025,
  • [22] Comparison of Machine Learning Approaches for Reconstructing Sea Subsurface Salinity Using Synthetic Data
    Tian, Tian
    Leng, Hongze
    Wang, Gongjie
    Li, Guancheng
    Song, Junqiang
    Zhu, Jiang
    An, Yuzhu
    REMOTE SENSING, 2022, 14 (22)
  • [23] An Application of Machine Learning for Plasma Current Quench Studies via Synthetic Data Generation
    Dalsania, Niharika
    Patel, Zeel
    Purohit, Shishir
    Chaudhury, Bhaskar
    FUSION ENGINEERING AND DESIGN, 2021, 171
  • [24] Machine Learning Based Flashover Prediction Models Using Synthetic Data and Fire ImagesMachine Learning Based Flashover Prediction Models Using Synthetic Data and Fire Images
    Yansheng Song
    Guang Xiao
    Haoran Wang
    Fire Technology, 2025, 61 (4) : 2389 - 2413
  • [25] Data Augmentation Using Synthetic Lesions Improves Machine Learning Detection of Microbleeds from MRI
    Momeni, Saba
    Fazllolahi, Amir
    Bourgeat, Pierrick
    Raniga, Parnesh
    Yates, Paul
    Yassi, Nawaf
    Desmond, Patricia
    Fripp, Jurgen
    Gao, Yongsheng
    Salvado, Olivier
    SIMULATION AND SYNTHESIS IN MEDICAL IMAGING, 2018, 11037 : 12 - 19
  • [26] Generating Synthetic Sensor Data to Facilitate Machine Learning Paradigm for Prediction of Building Fire Hazard
    Tam, Wai Cheong
    Fu, Eugene Yujun
    Peacock, Richard
    Reneke, Paul
    Wang, Jun
    Li, Jiajia
    Cleary, Thomas
    FIRE TECHNOLOGY, 2023, 59 (06) : 3027 - 3048
  • [27] Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniques
    de Reus, Pepijn
    Oprescu, Ana
    van Elsen, Koen
    2023 INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABILITY, ICT4S, 2023, : 57 - 65
  • [28] A review of synthetic and augmented training data for machine learning in ultrasonic non-destructive evaluation
    Sebastian, Uhlig
    Ilkin, Alkhasli
    Frank, Schubert
    Constanze, Tschoepe
    Matthias, Wolff
    ULTRASONICS, 2023, 134
  • [29] Generating Synthetic Sensor Data to Facilitate Machine Learning Paradigm for Prediction of Building Fire Hazard
    Wai Cheong Tam
    Eugene Yujun Fu
    Richard Peacock
    Paul Reneke
    Jun Wang
    Jiajia Li
    Thomas Cleary
    Fire Technology, 2023, 59 : 3027 - 3048
  • [30] Predicting doxorubicin-induced cardiotoxicity in breast cancer: leveraging machine learning with synthetic data
    Araujo, Daniella Castro
    Simoes, Ricardo
    Sabino, Adriano de Paula
    Oliveira, Angelica Navarro de
    de Oliveira, Camila Maciel
    Veloso, Adriano Alonso
    Gomes, Karina Braga
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2025, : 1535 - 1550