JurassicWorld Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation

被引：0

作者：

Martin, Alexander ^{[1
]}

Zheng, Haitian ^{[1
]}

An, Jie ^{[1
]}

Luo, Jiebo ^{[1
]}

机构：

[1] Univ Rochester, Rochester, NY 14627 USA

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

关键词：

image-to-image translation; large domain gap; stable diffusion;

D O I：

10.1145/3581783.3612708

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With a strong understanding of the target domain from natural language, we produce promising results in translating across large domain gaps and bringing skeletons back to life. In thiswork, we use text-guided latent diffusion models for zero-shot image-to-image translation (I2I) across large domain gaps (longI2I), where large amounts of new visual features and new geometry need to be generated to enter the target domain. Being able to perform translations across large domain gaps has a wide variety of real-world applications in criminology, astrology, environmental conservation, and paleontology. In this work, we introduce a new task Skull2Animal for translating between skulls and living animals. On this task, we find that unguided Generative Adversarial Networks (GANs) are not capable of translating across large domain gaps. Instead of these traditional I2I methods, we explore the use of guided diffusion and image editing models and provide a new benchmark model, Revive2I, capable of performing zero-shot I2I via text-prompting latent diffusion models. We find that guidance is necessary for longI2I because, to bridge the large domain gap, prior knowledge about the target domain is needed. In addition, we find that prompting provides the best and most scalable information about the target domain as classifier-guided diffusion models require retraining for specific use cases and lack stronger constraints on the target domain because of the wide variety of images they are trained on.

引用

页码：9320 / 9328

页数：9

共 39 条

[1] Zero-shot Image-to-Image Translation
Parmar, Gaurav
Singh, Krishna Kumar
Zhang, Richard
Li, Yijun
Lu, Jingwan
Zhu, Jun-Yan
PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,
[2] Zero-shot unsupervised image-to-image translation via exploiting semantic attributes
Chen, Yuanqi
Yu, Xiaoming
Liu, Shan
Gao, Wei
Li, Ge
Image and Vision Computing, 2022, 124
[3] Zero-shot unsupervised image-to-image translation via exploiting semantic attributes
Chen, Yuanqi
Yu, Xiaoming
Liu, Shan
Gao, Wei
Li, Ge
IMAGE AND VISION COMPUTING, 2022, 124
[4] ZstGAN: An adversarial approach for Unsupervised Zero-Shot Image-to-image Translation
Lin, Jianxin
Xia, Yingce
Liu, Sen
Zhao, Shuxin
Chen, Zhibo
NEUROCOMPUTING, 2021, 461 : 327 - 335
[5] Zero-Shot Medical Image Translation via Frequency-Guided Diffusion Models
Li, Yunxiang
Shao, Hua-Chieh
Liang, Xiao
Chen, Liyuan
Li, Ruiqi
Jiang, Steve
Wang, Jing
Zhang, You
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (03) : 980 - 993
[6] Inductive Zero-Shot Image Annotation via Embedding Graph
Wang, Fangxin
Liu, Jie
Zhang, Shuwu
Zhang, Guixuan
Li, Yuejun
Yuan, Fei
IEEE ACCESS, 2019, 7 : 107816 - 107830
[7] Zero-shot Pose Estimation Using Image Translation to Maintain Object Pose
Fujita K.
Tasaki T.
IEEJ Transactions on Electronics, Information and Systems, 2023, 143 (12) : 1113 - 1122
[8] Zero-shot image classification via Visual–Semantic Feature Decoupling
Xin Sun
Yu Tian
Haojie Li
Multimedia Systems, 2024, 30
[9] Boosting Zero-Shot Image Classification via Pairwise Relationship Learning
Li, Hanhui
Wu, Hefeng
Lin, Shujin
Lin, Liang
Luo, Xiaonan
Izquierdo, Ebroul
COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 : 85 - 99
[10] SIMSAM: ZERO-SHOT MEDICAL IMAGE SEGMENTATION VIA SIMULATED INTERACTION
Towle, Benjamin
Chen, Xin
Zhou, Ke
IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI 2024, 2024,

← 1 2 3 4 →