Prompt-Based Learning for Image Variation Using Single Image Multi-Scale Diffusion Models

被引:0
作者
Park, Jiwon [1 ]
Jeong, Dasol [2 ]
Lee, Hyebean [2 ]
Han, Seunghee [2 ]
Paik, Joonki [1 ,2 ]
机构
[1] Chung Ang Univ, Dept Artificial Intelligence, Seoul 06974, South Korea
[2] Chung Ang Univ, Dept Image, Seoul 06974, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
基金
新加坡国家研究基金会;
关键词
Training; Computational modeling; Periodic structures; Diffusion models; Data models; Image synthesis; Adaptation models; Noise reduction; Feature extraction; Context modeling; Single image generation; prompt-based learning; text guided image editing;
D O I
10.1109/ACCESS.2024.3487215
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel technique for a multi-scale framework with text-based learning using a single image to perform variations and text-based editing of the input image. Our approach captures the detailed internal information of a single image, enabling numerous variations while preserving the original features. In addition, text-conditioned learning provides a method to combine text and images to effectively perform text-based editing based on a single image. We propose a technique that integrates the diffusion U-Net structure within a multi-scale framework to accurately capture the quality and internal structure of an image from a single image and perform diverse variations while maintaining the features of the original image. Additionally, we utilized a pre-trained Bootstrapped Language-Image Pretraining (BLIP) model to generate various prompts for effective text-based editing, and we fed the prompts that most closely resembled the input image into the training process using Contrastive Language-Image Pretraining (CLIP)'s prior knowledge. To improve accuracy during the image editing stage, we designed a contrastive loss function to enhance the relevance between the prompt and the image. As a result, we improved the performance of learning between text and images, and through various experiments, we demonstrated its effectiveness on text-based image editing tasks. Our experiments show that the proposed method significantly improves the performance of single-image-based generative models and presents new possibilities in the field of text-based image editing.
引用
收藏
页码:158810 / 158823
页数:14
相关论文
共 40 条
  • [1] Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, 10.48550/arXiv.2005.14165, DOI 10.48550/ARXIV.2005.14165]
  • [2] Bao HB, 2022, Arxiv, DOI arXiv:2111.02358
  • [3] Text2LIVE: Text-Driven Layered Image and Video Editing
    Bar-Tal, Omer
    Ofri-Amar, Dolev
    Fridman, Rafail
    Kasten, Yoni
    Dekel, Tali
    [J]. COMPUTER VISION - ECCV 2022, PT XV, 2022, 13675 : 707 - 723
  • [4] InstructPix2Pix: Learning to Follow Image Editing Instructions
    Brooks, Tim
    Holynski, Aleksander
    Efros, Alexei A.
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18392 - 18402
  • [5] MOGAN: Morphologic-Structure-Aware Generative Learning From a Single Image
    Chen, Jinshu
    Xu, Qihui
    Kang, Qi
    Zhou, MengChu
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (04): : 2021 - 2033
  • [6] Fellbaum C, 2010, THEORY AND APPLICATIONS OF ONTOLOGY: COMPUTER APPLICATIONS, P231, DOI 10.1007/978-90-481-8847-5_10
  • [7] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
  • [8] Han FZ, 2024, Arxiv, DOI arXiv:2405.05769
  • [9] Hessel J., 2021, arXiv
  • [10] Improved Techniques for Training Single-Image GANs
    Hinz, Tobias
    Fisher, Matthew
    Wang, Oliver
    Wermter, Stefan
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1299 - 1308