WAVELET-GUIDED ACCELERATION OF TEXT INVERSION IN DIFFUSION-BASED IMAGE EDITING

被引:2
作者
Koo, Gwanhyeong [1 ]
Yoon, Sunjae [1 ]
Yoo, Chang D. [1 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Daejeon, South Korea
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
Image editing; Null-Text Inversion; text optimization; diffusion model;
D O I
10.1109/ICASSP48485.2024.10446603
中图分类号
学科分类号
摘要
In the field of image editing, Null-text Inversion (NTI) enables fine-grained editing while preserving the structure of the original image by optimizing null embeddings during the DDIM sampling process. However, the NTI process is time-consuming, taking more than two minutes per image. To address this, we introduce an innovative method that maintains the principles of the NTI while accelerating the image editing process. We propose the WaveOpt-Estimator, which determines the text optimization endpoint based on frequency characteristics. Utilizing wavelet transform analysis to identify the image's frequency characteristics, we can limit text optimization to specific timesteps during the DDIM sampling process. By adopting the Negative-Prompt Inversion (NPI) concept, a target prompt representing the original image serves as the initial text value for optimization. This approach maintains performance comparable to NTI while reducing the average editing time by over 80% compared to the NTI method. Our method presents a promising approach for efficient, high-quality image editing based on diffusion models.
引用
收藏
页码:4380 / 4384
页数:5
相关论文
共 20 条
  • [1] Brock Andrew, 2019, ICLR, DOI DOI 10.18653/V1/2021.NAACL-MAIN.465
  • [2] Cao Mingdeng, 2023, ARXIV
  • [3] Couairon Guillaume, 2022, ARXIV
  • [4] Generative Adversarial Networks An overview
    Creswell, Antonia
    White, Tom
    Dumoulin, Vincent
    Arulkumaran, Kai
    Sengupta, Biswa
    Bharath, Anil A.
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) : 53 - 65
  • [5] Dhariwal P, 2021, ADV NEUR IN, V34
  • [6] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
  • [7] Han Ligong, 2023, ARXIV
  • [8] Hertz Amir, 2022, ARXIV
  • [9] Ho J., 2020, Advances in Neural Information Processing Systems, V33, P6840
  • [10] Imagic: Text-Based Real Image Editing with Diffusion Models
    Kawar, Bahjat
    Zada, Shiran
    Lang, Oran
    Tov, Omer
    Chang, Huiwen
    Dekel, Tali
    Mosseri, Inbar
    Irani, Michal
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6007 - 6017