Zero-shot Image-to-Image Translation

被引:101
|
作者
Parmar, Gaurav [1 ]
Singh, Krishna Kumar [2 ]
Zhang, Richard [3 ]
Li, Yijun [4 ]
Lu, Jingwan [2 ]
Zhu, Jun-Yan [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Adobe, San Jose, CA USA
[3] Adobe, San Francisco, CA USA
[4] Adobe, Seattle, WA USA
关键词
Image Editing; Diffusion Models; Deep Generative Models;
D O I
10.1145/3588432.3591513
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse, high-quality images. However, directly applying these models for real image editing remains challenging for two reasons. First, it is hard for users to craft a perfect text prompt depicting every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we introduce pix2pix-zero, an image-to-image translation method that can preserve the original image's content without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the content structure, we propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. Finally, to enable interactive editing, we distill the diffusion model into a fast conditional GAN. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] ZstGAN: An adversarial approach for Unsupervised Zero-Shot Image-to-image Translation
    Lin, Jianxin
    Xia, Yingce
    Liu, Sen
    Zhao, Shuxin
    Chen, Zhibo
    NEUROCOMPUTING, 2021, 461 : 327 - 335
  • [2] Zero-shot unsupervised image-to-image translation via exploiting semantic attributes
    Chen, Yuanqi
    Yu, Xiaoming
    Liu, Shan
    Gao, Wei
    Li, Ge
    Image and Vision Computing, 2022, 124
  • [3] Zero-shot unsupervised image-to-image translation via exploiting semantic attributes
    Chen, Yuanqi
    Yu, Xiaoming
    Liu, Shan
    Gao, Wei
    Li, Ge
    IMAGE AND VISION COMPUTING, 2022, 124
  • [4] JurassicWorld Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation
    Martin, Alexander
    Zheng, Haitian
    An, Jie
    Luo, Jiebo
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9320 - 9328
  • [5] Few-Shot Unsupervised Image-to-Image Translation
    Liu, Ming-Yu
    Huang, Xun
    Mallya, Arun
    Karras, Tero
    Aila, Timo
    Lehtinen, Jaakko
    Kautz, Jan
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10550 - 10559
  • [6] Zero-Shot Image Dehazing
    Li, Boyun
    Gou, Yuanbiao
    Liu, Jerry Zitao
    Zhu, Hongyuan
    Zhou, Joey Tianyi
    Peng, Xi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8457 - 8466
  • [7] General Image-to-Image Translation with One-Shot Image Guidance
    Cheng, Bin
    Liu, Zuhao
    Peng, Yunbo
    Lin, Yue
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22679 - 22689
  • [8] Zero-shot Image Categorization by Image Correlation Exploration
    Gao, LianLi
    Song, Jingkuan
    Shao, Junming
    Zhu, Xiaofeng
    Shen, Heng Tao
    ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 487 - 490
  • [9] Fast Zero-Shot Image Tagging
    Zhang, Yang
    Gong, Boqing
    Shah, Mubarak
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5985 - 5994
  • [10] Generating Adversarial Examples in One Shot With Image-to-Image Translation GAN
    Zhang, Weijia
    IEEE ACCESS, 2019, 7 : 151103 - 151119