Using Gaussian Copulas and Generative Adversarial Networks for Generating Synthetic Data in Beet Productivity Analysis

被引:0
作者
dos Santos, Denize Palmito [1 ]
Vasconcelos, Julio Cezar Souza [2 ]
机构
[1] Univ Fed Mato Grosso do Sul, Campo Grande, MS, Brazil
[2] Univ Fed Sao Paulo, Sao Jose Dos Campos, SP, Brazil
关键词
Beet; Quadratic polynomials; Random forest; Sample scarcity;
D O I
10.1007/s12355-024-01506-w
中图分类号
S3 [农学(农艺学)];
学科分类号
0901 ;
摘要
In scientific research, field experiments are essential to validate theories in real conditions. However, these investigations often face limitations due to sample scarcity, which can compromise the robustness and interpretability of results. Synthetic data generation offers an effective solution for expanding datasets, enabling more comprehensive analyses even when real data are limited. Although synthetic data are not real, it can maintain the mathematical and statistical properties of real data, making it a valuable tool for improving analytical accuracy. This study aims to generate synthetic data using two synthesizers: Gaussian Copulas and Generative Adversarial Neural Networks (GANs). The dataset used refers to the evaluation of the effects of different levels of nitrogen fertilizers (N) on the dry matter production of sugar beet roots. Five nitrogen fertilizers levels were tested: 0, 35, 70, 105, and 140 kg/ha, with a randomized block design containing three blocks and five plots per block. The focus of this research is to increase the sample size to consider different statistical and machine learning models. The comparison between synthetic and real data revealed that the Gaussian Copulas synthesizer outperformed the CTGAN synthesizer. This superiority was evidenced by the proximity of the graphical representations and the performance of the models compared to real data. Furthermore, the random forest model trained with synthetic data generated by Gaussian Copulas presented better performance metrics than the CTGAN synthesizer, indicating that synthetic data can be a valuable support in the analysis of agronomic experiments.
引用
收藏
页码:407 / 417
页数:11
相关论文
共 21 条
  • [1] A comprehensive review of synthetic data generation in smart farming by using variational autoencoder and generative adversarial network
    Akkem, Yaganteeswarudu
    Biswas, Saroj Kumar
    Varanasi, Aruna
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 131
  • [2] Alao O.B., 2022, INT EMB SYST S, V7, P127
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] Synthetic data augmentation by diffusion probabilistic models to enhance weed recognition
    Chen, Dong
    Qi, Xinda
    Zheng, Yu
    Lu, Yuzhen
    Huang, Yanbo
    Li, Zhaojian
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 216
  • [5] Optimal Designs in Plant Breeding Experiments: A Simulation Study Comparing Grid-Plot and Partially Replicated (p-Rep) Design
    dos Santos, Denize Palmito
    Sermarini, Renata Alcarde
    dos Santos, Alessandra
    Demetrio, Clarice Garcia Borges
    [J]. SUGAR TECH, 2024, 26 (02) : 387 - 395
  • [6] Deep convolutional neural networks for image-based Convolvulus sepium detection in sugar beet fields
    Gao, Junfeng
    French, Andrew P.
    Pound, Michael P.
    He, Yong
    Pridmore, Tony P.
    Pieters, Jan G.
    [J]. PLANT METHODS, 2020, 16 (01)
  • [7] Generate-Paste-Blend-Detect: Synthetic dataset for object detection in the agriculture domain
    Giakoumoglou, Nikolaos
    Pechlivani, Eleftheria Maria
    Tzovaras, Dimitrios
    [J]. SMART AGRICULTURAL TECHNOLOGY, 2023, 5
  • [8] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
  • [9] Hallösta S, 2024, PR MACH LEARN RES, V233, P81
  • [10] Muetanene BA., 2022, SELECTION INDICES SU