InstaFormer++: Multi-Domain Instance-Aware Image-to-Image Translation with Transformer

被引：0

作者：

Soohyun Kim

Jongbeom Baek

Jihye Park

Eunjae Ha

Homin Jung

Taeyoung Lee

Seungryong Kim

机构：

[1] Korea University,

[2] Hanwha Systems Co.,undefined

[3] Ltd,undefined

来源：

International Journal of Computer Vision | 2024年 / 132卷

关键词：

Image-to-image translation; GANs; Instance-aware image-to-image translation; Vision and language;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an image as visual tokens, our model discovers global consensus of content features by considering context information through self-attention module of Transformers. By augmenting such tokens with an instance-level feature extracted from the content feature with respect to bounding box information, our framework is capable of learning an interaction between object instances and the global image, thus boosting the instance-awareness. We replace layer normalization (LayerNorm) in standard Transformers with adaptive instance normalization (AdaIN) to enable a multi-modal translation with style codes. In addition, to improve the instance-awareness and translation quality at object regions, we present an instance-level content contrastive loss defined between input and translated image. Although competitive performance can be attained by InstaFormer, it may face some limitations, i.e., limited scalability in handling multiple domains, and reliance on domain annotations. To overcome this, we propose InstaFormer++ as an extension of Instaformer, which enables multi-domain translation in instance-aware image translation for the first time. We propose to obtain pseudo domain label by leveraging a list of candidate domain labels in a text format and pretrained vision-language model. We conduct experiments to demonstrate the effectiveness of our methods over the latest methods and provide extensive ablation studies.

引用

页码：1167 / 1186

页数：19

共 50 条

[21] Registration on DCE-MRI images via multi-domain image-to-image translation
Cai, Naxin
Chen, Houjin
Li, Yanfeng
Peng, Yahui
Guo, Linqiang
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2023, 104
[22] Unsupervised Exemplar-Domain Aware Image-to-Image Translation
Fu, Yuanbin
Ma, Jiayi
Guo, Xiaojie
ENTROPY, 2021, 23 (05)
[23] SemiStarGAN: Semi-supervised Generative Adversarial Networks for Multi-domain Image-to-Image Translation
Hsu, Shu-Yu
Yang, Chih-Yuan
Huang, Chi-Chia
Hsu, Jane Yung-jen
COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 338 - 353
[24] AMMUNIT: An Attention-Based Multimodal Multi-domain UNsupervised Image-to-Image Translation Framework
Luo, Lei
Hsu, William H.
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 358 - 370
[25] Instance-Aware Hashing for Multi-Label Image Retrieval
Lai, Hanjiang
Yan, Pan
Shu, Xiangbo
Wei, Yunchao
Yan, Shuicheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (06) : 2469 - 2479
[26] SoloGAN: Multi-domain Multimodal Unpaired Image-to-Image Translation via a Single Generative Adversarial Network
Huang S.
He C.
Cheng R.
IEEE Transactions on Artificial Intelligence, 2022, 3 (05): : 722 - 737
[27] A Domain Gap Aware Generative Adversarial Network for Multi-Domain Image Translation
Xu, Wenju
Wang, Guanghui
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 72 - 84
[28] Towards Instance-level Image-to-Image Translation
Shen, Zhiqiang
Huang, Mingyang
Shi, Jianping
Xue, Xiangyang
Huang, Thomas
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3678 - 3687
[29] Domain Adaptive Image-to-image Translation
Chen, Ying-Cong
Xu, Xiaogang
Jia, Jiaya
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5273 - 5282
[30] Panoptic-aware Image-to-Image Translation
Zhang, Liyun
Ratsamee, Photchara
Wang, Bowen
Luo, Zhaojie
Uranishi, Yuki
Higashida, Manabu
Takemura, Haruo
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 259 - 268

← 1 2 3 4 5 →