Synthetic image learning: Preserving performance and preventing Membership Inference Attacks

被引:0
作者
Lomurno, Eugenio [1 ]
Matteucci, Matteo [1 ]
机构
[1] Politecn Milan, Dept Elect Informat & Bioengn, I-20133 Milan, Italy
关键词
Generative deep learning; Dataset generation; Classification Accuracy Score; Privacy; Membership Inference Attack; Generative Knowledge Distillation; Knowledge Recycling;
D O I
10.1016/j.patrec.2025.02.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generative artificial intelligence has transformed the generation of synthetic data, providing innovative solutions to challenges like data scarcity and privacy, which are particularly critical infields such as medicine. However, the effective use of this synthetic data to train high-performance models remains a significant challenge. This paper addresses this issue by introducing Knowledge Recycling (KR), a pipeline designed to optimise the generation and use of synthetic data for training downstream classifiers. At the heart of this pipeline is Generative Knowledge Distillation, the proposed technique that significantly improves the quality and usefulness of the information provided to classifiers through a synthetic dataset regeneration and soft labelling mechanism. The KR pipeline has been tested on a variety of datasets, with a focus on six highly heterogeneous medical image datasets, ranging from retinal images to organ scans. The results show a significant reduction in the performance gap between models trained on real and synthetic data, with models based on synthetic data outperforming those trained on real data in some cases. Furthermore, the resulting models show almost complete immunity to Membership Inference Attacks, manifesting privacy properties missing in models trained with conventional techniques.
引用
收藏
页码:52 / 58
页数:7
相关论文
共 42 条
  • [1] Lampis A., Lomurno E., Matteucci M., Bridging the Gap: Enhancing the Utility of Synthetic Data via Post-Processing Techniques, (2023)
  • [2] Shokri R., Stronati M., Song C., Shmatikov V., Membership inference attacks against machine learning models, (2017)
  • [3] Elasri M., Elharrouss O., Al-Maadeed S., Tairi H., Image generation: A review, Neural Process. Lett., 54, 5, pp. 4609-4646, (2022)
  • [4] Sakirin T., Kusuma S., A survey of generative artificial intelligence techniques, Babylon. J. Artif. Intell., 2023, pp. 10-14, (2023)
  • [5] Kang M., Zhu J.-Y., Zhang R., Park J., Shechtman E., Paris S., Park T., Scaling up gans for text-to-image synthesis, (2023)
  • [6] Xu M., Yoon S., Fuentes A., Park D.S., A comprehensive survey of image augmentation techniques for deep learning, Pattern Recognit., 137, (2023)
  • [7] Frid-Adar M., Diamant I., Klang E., Amitai M., Goldberger J., Greenspan H., GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification, Neurocomputing, (2018)
  • [8] Sedigh P., Sadeghian R., Masouleh M.T., Generating synthetic medical images by using GAN to improve CNN performance in skin cancer classification, (2019)
  • [9] Islam J., Zhang Y., GAN-based synthetic brain PET image generation, Brain Inform., (2020)
  • [10] Lomurno E., Archetti A., Cazzella L., Samele S., Di Perna L., Matteucci M., SGDE: Secure generative data exchange for cross-silo federated learning, (2022)