Conditional Score Guidance for Text-Driven Image-to-Image Translation

被引：0

作者：

Lee, Hyunsoo ^{[1
]}

Kang, Minsoo ^{[1
]}

Han, Bohyung ^{[1
,2
]}

机构：

[1] Seoul Natl Univ, ECE, Seoul, South Korea

[2] Seoul Natl Univ, IPAI, Seoul, South Korea

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our method aims to generate a target image by selectively editing regions of interest in a source image, defined by a modifying text, while preserving the remaining parts. In contrast to existing techniques that solely rely on a target prompt, we introduce a new score function that additionally considers both the source image and the source text prompt, tailored to address specific translation tasks. To this end, we derive the conditional score function in a principled way, decomposing it into the standard score and a guiding term for target image generation. For the gradient computation about the guiding term, we assume a Gaussian distribution for the posterior distribution and estimate its mean and variance to adjust the gradient without additional training. In addition, to improve the quality of the conditional score guidance, we incorporate a simple yet effective mixup technique, which combines two cross-attention maps derived from the source and target latents. This strategy is effective for promoting a desirable fusion of the invariant parts in the source image and the edited regions aligned with the target prompt, leading to high-fidelity target image generation. Through comprehensive experiments, we demonstrate that our approach achieves outstanding image-to-image translation performance on various tasks. Code is available at https://github.com/Hleephilip/CSG.

引用

页数：24

共 50 条

[21] Domain Adaptive Image-to-image Translation
Chen, Ying-Cong
Xu, Xiaogang
Jia, Jiaya
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5273 - 5282
[22] Unsupervised Image-to-Image Translation: A Review
Hoyez, Henri
Schockaert, Cedric
Rambach, Jason
Mirbach, Bruno
Stricker, Didier
SENSORS, 2022, 22 (21)
[23] Unsupervised Image-to-Image Translation Networks
Liu, Ming-Yu
Breuel, Thomas
Kautz, Jan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[24] Text-driven human image generation with texture and pose control
Jin, Zhedong
Xia, Guiyu
Yang, Paike
Wang, Mengxiang
Sun, Yubao
Liu, Qingshan
NEUROCOMPUTING, 2025, 634
[25] TexFit: Text-Driven Fashion Image Editing with Diffusion Models
Wang, Tongxin
Ye, Mang
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10198 - 10206
[26] ConIS: controllable text-driven image stylization with semantic intensity
Yang, Gaoming
Li, Changgeng
Zhang, Ji
MULTIMEDIA SYSTEMS, 2024, 30 (04)
[27] Open-Vocabulary Text-Driven Human Image Generation
Zhang, Kaiduo
Sun, Muyi
Sun, Jianxin
Zhang, Kunbo
Sun, Zhenan
Tan, Tieniu
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (10) : 4379 - 4397
[28] DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization
Huang, Nisha
Zhang, Yuxin
Tang, Fan
Ma, Chongyang
Huang, Haibin
Dong, Weiming
Xu, Changsheng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (02) : 3370 - 3383
[29] Text2LIVE: Text-Driven Layered Image and Video Editing
Bar-Tal, Omer
Ofri-Amar, Dolev
Fridman, Rafail
Kasten, Yoni
Dekel, Tali
COMPUTER VISION - ECCV 2022, PT XV, 2022, 13675 : 707 - 723
[30] DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
Lyu, Yueming
Lin, Tianwei
Li, Fu
He, Dongliang
Dong, Jing
Tan, Tieniu
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6894 - 6903

← 1 2 3 4 5 →