An Evaluation Framework for Synthetic Data Generation Models

被引:0
作者
Livieris, I. E. [1 ,2 ]
Alimpertis, N. [1 ]
Domalis, G. [1 ]
Tsakalidis, D. [1 ]
机构
[1] Novelcore, Athens 10436, Greece
[2] Univ Pireaus, Dept Stat & Insurance Sci, Piraeus, Greece
来源
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, PT III, AIAI 2024 | 2024年 / 713卷
关键词
Synthetic data generator; evaluation framework; tabular data; statistical analysis;
D O I
10.1007/978-3-031-63219-8_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, the use of synthetic data has gained popularity as a cost-efficient strategy for enhancing data augmentation for improving machine learning models performance as well as addressing concerns related to sensitive data privacy. Therefore, the necessity of ensuring quality of generated synthetic data, in terms of accurate representation of real data, consists of primary importance. In this work, we present a new framework for evaluating synthetic data generation models' ability for developing high-quality synthetic data. The proposed approach is able to provide strong statistical and theoretical information about the evaluation framework and the compared models' ranking. Two use case scenarios demonstrate the applicability of the proposed framework for evaluating the ability of synthetic data generation models to generated high quality data.
引用
收藏
页码:320 / 335
页数:16
相关论文
共 29 条
  • [1] A Quantitative and Qualitative Analysis of the Robustness of (Real-World) Election Winners
    Boehmer, Niclas
    Bredereck, Robert
    Faliszewski, Piotr
    Niedermeier, Rolf
    [J]. ACM CONFERENCE ON EQUITY AND ACCESS IN ALGORITHMS, MECHANISMS, AND OPTIMIZATION, EAAMO 2022, 2022,
  • [2] A Review of Tabular Data Synthesis Using GANs on an IDS Dataset
    Bourou, Stavroula
    El Saer, Andreas
    Velivassaki, Terpsichori-Helen
    Voulkidis, Artemis
    Zahariadis, Theodore
    [J]. INFORMATION, 2021, 12 (09)
  • [3] Canbek G, 2017, 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), P821, DOI 10.1109/UBMK.2017.8093539
  • [4] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [5] Chundawat VS, 2024, Arxiv, DOI arXiv:2207.05295
  • [6] A Multi-Dimensional Evaluation of Synthetic Data Generators
    Dankar, Fida K.
    Ibrahim, Mahmoud K.
    Ismail, Leila
    [J]. IEEE ACCESS, 2022, 10 : 11147 - 11158
  • [7] On the Quality of Synthetic Generated Tabular Data
    Espinosa, Erica
    Figueira, Alvaro
    [J]. MATHEMATICS, 2023, 11 (15)
  • [8] Survey on Synthetic Data Generation, Evaluation Methods and GANs
    Figueira, Alvaro
    Vaz, Bruno
    [J]. MATHEMATICS, 2022, 10 (15)
  • [10] Fruhwirth-Schnatter S., 2019, HDB MIXTURE ANAL