Prompt-Based Learning for Image Variation Using Single Image Multi-Scale Diffusion Models

被引：0

作者：

Park, Jiwon ^{[1
]}

Jeong, Dasol ^{[2
]}

Lee, Hyebean ^{[2
]}

Han, Seunghee ^{[2
]}

Paik, Joonki ^{[1
,2
]}

机构：

[1] Chung Ang Univ, Dept Artificial Intelligence, Seoul 06974, South Korea

[2] Chung Ang Univ, Dept Image, Seoul 06974, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

新加坡国家研究基金会;

关键词：

Training; Computational modeling; Periodic structures; Diffusion models; Data models; Image synthesis; Adaptation models; Noise reduction; Feature extraction; Context modeling; Single image generation; prompt-based learning; text guided image editing;

D O I：

10.1109/ACCESS.2024.3487215

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a novel technique for a multi-scale framework with text-based learning using a single image to perform variations and text-based editing of the input image. Our approach captures the detailed internal information of a single image, enabling numerous variations while preserving the original features. In addition, text-conditioned learning provides a method to combine text and images to effectively perform text-based editing based on a single image. We propose a technique that integrates the diffusion U-Net structure within a multi-scale framework to accurately capture the quality and internal structure of an image from a single image and perform diverse variations while maintaining the features of the original image. Additionally, we utilized a pre-trained Bootstrapped Language-Image Pretraining (BLIP) model to generate various prompts for effective text-based editing, and we fed the prompts that most closely resembled the input image into the training process using Contrastive Language-Image Pretraining (CLIP)'s prior knowledge. To improve accuracy during the image editing stage, we designed a contrastive loss function to enhance the relevance between the prompt and the image. As a result, we improved the performance of learning between text and images, and through various experiments, we demonstrated its effectiveness on text-based image editing tasks. Our experiments show that the proposed method significantly improves the performance of single-image-based generative models and presents new possibilities in the field of text-based image editing.

引用

页码：158810 / 158823

页数：14

共 40 条

[11] Ho J., 2020, Advances in Neural Information Processing Systems, V33, P6840
[12] Jiang YL, 2023, Arxiv, DOI arXiv:2302.08047
[13] Kang WJ, 2024, Arxiv, DOI arXiv:2403.09468
[14] Khosla P, 2020, ADV NEUR IN, V33
[15] Kulikov Vladimir, 2023, PMLR, P17920
[16] Li JN, 2022, PR MACH LEARN RES
[17] Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Liu, Pengfei
Yuan, Weizhe
Fu, Jinlan
Jiang, Zhengbao
Hayashi, Hiroaki
Neubig, Graham
[J]. ACM COMPUTING SURVEYS, 2023, 55 (09)
[18] Nikankin Y, 2023, Arxiv, DOI arXiv:2211.11743
[19] StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Patashnik, Or
Wu, Zongze
Shechtman, Eli
Cohen-Or, Daniel
Lischinski, Dani
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2065 - 2074
[20] Radford A, 2021, PR MACH LEARN RES, V139

← 1 2 3 4 →