Advances in diffusion models for image data augmentation: a review of methods, models, evaluation metrics and future research directions

被引:4
作者
Alimisis, Panagiotis [1 ]
Mademlis, Ioannis [1 ]
Radoglou-Grammatikis, Panagiotis [2 ,3 ]
Sarigiannidis, Panagiotis [2 ]
Papadopoulos, Georgios Th. [1 ]
机构
[1] Harokopio Univ Athens, Dept Informat & Telemat, Thiseos 70,Attiki, Athens 17676, Greece
[2] Univ Western Macedonia, Dept Elect & Comp Engn, Act Urban Planning Zone, Kozani 50150, Kozani, Greece
[3] K3Y, Vitosha Quarter,Bl 9, BG-1700 Sofia, Bulgaria
关键词
Image data augmentation; Diffusion models; Generative artificial intelligence; Evaluation metrics;
D O I
10.1007/s10462-025-11116-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image data augmentation constitutes a critical methodology in modern computer vision tasks, since it can facilitate towards enhancing the diversity and quality of training datasets; thereby, improving the performance and robustness of machine learning models in downstream tasks. In parallel, augmentation approaches can also be used for editing/modifying a given image in a context- and semantics-aware way. Diffusion Models (DMs), which comprise one of the most recent and highly promising classes of methods in the field of generative Artificial Intelligence (AI), have emerged as a powerful tool for image data augmentation, capable of generating realistic and diverse images by learning the underlying data distribution. The current study realizes a systematic, comprehensive and in-depth review of DM-based approaches for image augmentation, covering a wide range of strategies, tasks and applications. In particular, a comprehensive analysis of the fundamental principles, model architectures and training strategies of DMs is initially performed. Subsequently, a taxonomy of the relevant image augmentation methods is introduced, focusing on techniques regarding semantic manipulation, personalization and adaptation, and application-specific augmentation tasks. Then, performance assessment methodologies and respective evaluation metrics are analyzed. Finally, current challenges and future research directions in the field are discussed.
引用
收藏
页数:55
相关论文
共 272 条
[1]  
Ackermann J., 2022, arXiv
[2]  
Agustsson E, 2017, ADV NEUR IN, V30
[3]   Diffusion-Based Data Augmentation for Skin Disease Classification: Impact Across Original Medical Datasets to Fully Synthetic Images [J].
Akrout, Mohamed ;
Gyepesi, Balint ;
Hollo, Peter ;
Poor, Adrienn ;
Kineso, Blaga ;
Solis, Stephen ;
Cirone, Katrina ;
Kawahara, Jeremy ;
Slade, Dekker ;
Abid, Latif ;
Kovacs, Mate ;
Fazekas, Istvan .
DEEP GENERATIVE MODELS, DGM4MICCAI 2023, 2024, 14533 :99-109
[4]  
Ali H, 2022, IR C ART INT COGN SC, P32
[5]  
Arkhipkin V, 2024, Arxiv, DOI arXiv:2312.03511
[6]   Image embedding for denoising generative models [J].
Asperti, Andrea ;
Evangelista, Davide ;
Marro, Samuele ;
Merizzi, Fabio .
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (12) :14511-14533
[7]  
atStabilityAI DL, 2023, DeepFloyd IF: a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding
[8]   SpaText: Spatio-Textual Representation for Controllable Image Generation [J].
Avrahami, Omri ;
Hayes, Thomas ;
Gafni, Oran ;
Gupta, Sonal ;
Taigman, Yaniv ;
Parikh, Devi ;
Lischinski, Dani ;
Fried, Ohad ;
Yin, Xi .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :18370-18380
[9]   Blended Latent Diffusion [J].
Avrahami, Omri ;
Fried, Ohad ;
Lischinski, Dani .
ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04)
[10]   Blended Diffusion for Text-driven Editing of Natural Images [J].
Avrahami, Omri ;
Lischinski, Dani ;
Fried, Ohad .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :18187-18197