Using Diffusion Models for Dataset Generation: Prompt Engineering vs. Fine-Tuning

被引：0

作者：

Voetman, Roy ^{[1
]}

van Meekeren, Alexander ^{[1
]}

Aghaei, Maya ^{[1
]}

Dijkstra, Klaas ^{[1
]}

机构：

[1] NHL Stenden Univ Appl Sci, Professorship Comp Vis & Data Sci, Rengerslaan 8-10, NL-8917 DD Leeuwarden, Netherlands

来源：

COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2023, PT I | 2023年 / 14184卷

关键词：

Stable Diffusion; Prompt Engineering; DreamBooth; Detection;

D O I：

10.1007/978-3-031-44237-7_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the notable achievements of deep object detection models, a major challenge remains to be the need for vast amounts of training data. The process of acquiring such real-world data is laborious, prompting the exploration of new research directions such as synthetic data generation. In this study, we assess the capability of two distinct synthetic data generating techniques utilising stable diffusion, namely, (1) Prompt engineering of an established model and (2) Fine-tuning a pretrained model. As a result, we generate two training datasets, manually annotate them, and train separate object detection models for testing on a real-world detection dataset. The results demonstrate that both prompt engineering and fine-tuning exhibit similar performance when tested on a set of 331 real-world images, in the context of apple detection in apple orchards. We compared their performance with the baseline setting where the model was trained on real-world images and witnessed only a 0.07 and 0.08 average precision deviation from the baseline model. Qualitative results demonstrate that both models are able to accurately predict the location of the apples, except in instances of heavy shading. This study distinguishes itself from prior research by focusing on object detection instead of image classification. Furthermore, we are the first to apply diffusion model fine-tuning in the context of dataset generation. Our findings underscore the potential of synthetic data generation as a viable alternative to the laborious collection of extensive training data for object detection models.

引用

页码：143 / 153

页数：11

共 30 条

[1]

Dhariwal P, 2021, ADV NEUR IN, V34

[2]

Eliassen T. O., 2022, Data synthesis with stable diffusion for dataset imbalance-computer vision

[3] Taming Transformers for High-Resolution Image Synthesis [J].

Esser, Patrick ;

Rombach, Robin ;

Ommer, Bjoern .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12868-12878

[4] The Pascal Visual Object Classes (VOC) Challenge [J].

Everingham, Mark ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338

[5]

Gal R, 2022, Arxiv, DOI [arXiv:2208.01618, 10.48550/arXiv.2208.01618]

[6] Generative Adversarial Networks [J].

Goodfellow, Ian ;

Pouget-Abadie, Jean ;

Mirza, Mehdi ;

Xu, Bing ;

Warde-Farley, David ;

Ozair, Sherjil ;

Courville, Aaron ;

Bengio, Yoshua .

COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144

[7] MinneApple: A Benchmark Dataset for Apple Detection and Segmentation [J].

Hani, Nicolai ;

Roy, Pravakar ;

Isler, Volkan .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) :852-858

[8]

Ho Jonathan., 2020, Adv neural Inf Process Syst, V33, P6840

[9]

Jocher G., 2020, Jocher Glenn YOLOv5 by Ultralytics

[10]

Lee J., 2019, arXiv

← 1 2 3 →