Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture

被引:0
|
作者
Klein, Jonathan [1 ]
Waller, Rebekah [2 ]
Pirk, Soeren [3 ]
Palubicki, Wojtek [4 ]
Tester, Mark [2 ]
Michels, Dominik L. [1 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Computat Sci Grp, Thuwal, Saudi Arabia
[2] King Abdullah Univ Sci & Technol KAUST, Ctr Desert Agr, Thuwal, Saudi Arabia
[3] Christian Albrechts Univ Kiel, Inst Comp Sci, Kiel, Germany
[4] Adam Mickiewicz Univ, Fac Math & Comp Sci, Poznan, Poland
来源
FRONTIERS IN PLANT SCIENCE | 2024年 / 15卷
关键词
artificial intelligence; data generation and annotation; disease detection; greenhouse farming; machine learning; synthetic data; tomato plants;
D O I
10.3389/fpls.2024.1360113
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction
    Wah, Yap Bee
    Ismail, Azlan
    Azid, Nur Niswah Naslina
    Jaafar, Jafreezal
    Aziz, Izzatdin Abdul
    Hasan, Mohd Hilmi
    Zain, Jasni Mohamad
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4821 - 4841
  • [42] Large scale energy labelling with models: The EU TABULA model versus machine learning with open data
    Hettinga, Sanne
    van 't Veer, Rein
    Boter, Jaap
    ENERGY, 2023, 264
  • [43] On Development of Data Science and Machine Learning Applications in Databricks
    Ruan, Wenhao
    Chen, Yifan
    Forouraghi, Babak
    SERVICES - SERVICES 2019, 2019, 11517 : 78 - 91
  • [44] BIG DATA ANALYTICS AND PRECISION ANIMAL AGRICULTURE SYMPOSIUM: Machine learning and data mining advance predictive big data analysis in precision animal agriculture
    Morota, Gota
    Ventura, Ricardo V.
    Silva, Fabyano F.
    Koyama, Masanori
    Fernando, Samodha C.
    JOURNAL OF ANIMAL SCIENCE, 2018, 96 (04) : 1540 - 1550
  • [45] Improvement of an Online Education Model with the Integration of Machine Learning and Data Analysis in an LMS
    Villegas-Ch, William
    Roman-Canizares, Milton
    Palacios-Pacheco, Xavier
    APPLIED SCIENCES-BASEL, 2020, 10 (15):
  • [46] Development of Big Data Predictive Analytics Model for Disease Prediction using Machine learning Technique
    Venkatesh, R.
    Balasubramanian, C.
    Kahappan, M.
    JOURNAL OF MEDICAL SYSTEMS, 2019, 43 (08)
  • [47] Machine Learning Model for Chest Radiographs: Using Local Data to Enhance Performance
    Mohn, Sarah F.
    Law, Marco
    Koleva, Maria
    Lee, Brian
    Berg, Adam
    Murray, Nicolas
    Nicolaou, Savvas
    Parker, William A.
    CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2023, 74 (03): : 548 - 556
  • [48] Machine learning based survival prediction in Glioma using large-scale registry data
    Zhao, Rachel
    Zhuge, Ying
    Camphausen, Kevin
    Krauze, Andra, V
    HEALTH INFORMATICS JOURNAL, 2022, 28 (04)
  • [49] Development of Big Data Predictive Analytics Model for Disease Prediction using Machine learning Technique
    R. Venkatesh
    C. Balasubramanian
    M. Kaliappan
    Journal of Medical Systems, 2019, 43
  • [50] Generating Synthetic MR Spectroscopic Imaging Data with Generative Adversarial Networks to Train Machine Learning Models
    Maruyama, Shuki
    Takeshima, Hidenori
    MAGNETIC RESONANCE IN MEDICAL SCIENCES, 2024,