DiffI2I: Efficient Diffusion Model for Image-to-Image Translation

被引:0
|
作者
Xia, Bin [1 ]
Zhang, Yulun [2 ]
Wang, Shiyin [3 ]
Wang, Yitong [3 ]
Wu, Xinglong [3 ]
Tian, Yapeng [4 ]
Yang, Wenming [1 ]
Timotfe, Radu [5 ,6 ]
Van Gool, Luc [2 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
[2] Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland
[3] Bytedance Inc, Shenzhen 518055, Peoples R China
[4] Univ Texas Dallas, Dept Comp Sci, Richardson, TX 75080 USA
[5] Univ Wurzburg, Comp Vis Lab, IFI, D-97070 Wurzburg, Germany
[6] Univ Wurzburg, CAIDAS, D-97070 Wurzburg, Germany
基金
中国国家自然科学基金;
关键词
Intellectual property; Noise reduction; Runtime; Image synthesis; Transformers; Semantic segmentation; Image restoration; Diffusion processes; Dense prediction; diffusion model; image restoration; image-to-image translation; inpainting; motion deblurring; super-resolution; SUPERRESOLUTION; RESTORATION; NETWORK;
D O I
10.1109/TPAMI.2024.3498003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Diffusion Model (DM) has emerged as the SOTA approach for image synthesis. However, the existing DM cannot perform well on some image-to-image translation (I2I) tasks. Different from image synthesis, some I2I tasks, such as super-resolution, require generating results in accordance with GT images. Traditional DMs for image synthesis require extensive iterations and large denoising models to estimate entire images, which gives their strong generative ability but also leads to artifacts and inefficiency for I2I. To tackle this challenge, we propose a simple, efficient, and powerful DM framework for I2I, called DiffI2I. Specifically, DiffI2I comprises three key components: a compact I2I prior extraction network (CPEN), a dynamic I2I transformer (DI2Iformer), and a denoising network. We train DiffI2I in two stages: pretraining and DM training. For pretraining, GT and input images are fed into CPEN(S1 )to capture a compact I2I prior representation (IPR) guiding DI2Iformer. In the second stage, the DM is trained to only use the input images to estimate the same IRP as CPENS1. Compared to traditional DMs, the compact IPR enables DiffI2I to obtain more accurate outcomes and employ a lighter denoising network and fewer iterations. Through extensive experiments on various I2I tasks, we demonstrate that DiffI2I achieves SOTA performance while significantly reducing computational burdens.
引用
收藏
页码:1578 / 1593
页数:16
相关论文
共 50 条
  • [21] Cross-Domain Interpolation for Unpaired Image-to-Image Translation
    Lopez, Jorge
    Mauricio, Antoni
    Diaz, Jose
    Camara, Guillermo
    COMPUTER VISION SYSTEMS (ICVS 2019), 2019, 11754 : 542 - 551
  • [22] Semi-supervised Task Aware Image-to-Image Translation
    Muetze, Annika
    Rottmann, Matthias
    Gottschalk, Hanno
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 98 - 122
  • [23] Asymmetric GAN for Unpaired Image-to-Image Translation
    Li, Yu
    Tang, Sheng
    Zhang, Rui
    Zhang, Yongdong
    Li, Jintao
    Yan, Shuicheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (12) : 5881 - 5896
  • [24] Equivariant Adversarial Network for Image-to-image Translation
    Zareapoor, Masoumeh
    Yang, Jie
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (02)
  • [25] Image-to-image translation using an offset-basedmulti-scale codes GAN encoder
    Guo, Zihao
    Shao, Mingwen
    Li, Shunhang
    VISUAL COMPUTER, 2024, 40 (02) : 699 - 715
  • [26] Image-to-Image Translation for Near-Infrared Image Colorization
    Kim, Hyeongyu
    Kim, Jonghyun
    Kim, Joongkyu
    2022 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2022,
  • [27] Cell Segmentation by Image-to-Image Translation using Multiple Different Discriminators
    Kato, Sota
    Hotta, Kazuhiro
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 4: BIOSIGNALS, 2020, : 330 - 335
  • [28] Allowing Supervision in Unsupervised Deformable- Instances Image-to-Image Translation
    Liu, Yu
    Su, Sitong
    Zhu, Junchen
    Zheng, Feng
    Gao, Lianli
    Song, Jingkuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5335 - 5349
  • [29] MULTIMODAL IMAGE-TO-IMAGE TRANSLATION FOR GENERATION OF GASTRITIS IMAGES
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2466 - 2470
  • [30] Image-to-Image Translation Using Identical-Pair Adversarial Networks
    Sung, Thai Leang
    Lee, Hyo Jong
    APPLIED SCIENCES-BASEL, 2019, 9 (13):