InstaFormer++: Multi-Domain Instance-Aware Image-to-Image Translation with Transformer

被引：0

作者：

Soohyun Kim

Jongbeom Baek

Jihye Park

Eunjae Ha

Homin Jung

Taeyoung Lee

Seungryong Kim

机构：

[1] Korea University,

[2] Hanwha Systems Co.,undefined

[3] Ltd,undefined

来源：

International Journal of Computer Vision | 2024年 / 132卷

关键词：

Image-to-image translation; GANs; Instance-aware image-to-image translation; Vision and language;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an image as visual tokens, our model discovers global consensus of content features by considering context information through self-attention module of Transformers. By augmenting such tokens with an instance-level feature extracted from the content feature with respect to bounding box information, our framework is capable of learning an interaction between object instances and the global image, thus boosting the instance-awareness. We replace layer normalization (LayerNorm) in standard Transformers with adaptive instance normalization (AdaIN) to enable a multi-modal translation with style codes. In addition, to improve the instance-awareness and translation quality at object regions, we present an instance-level content contrastive loss defined between input and translated image. Although competitive performance can be attained by InstaFormer, it may face some limitations, i.e., limited scalability in handling multiple domains, and reliance on domain annotations. To overcome this, we propose InstaFormer++ as an extension of Instaformer, which enables multi-domain translation in instance-aware image translation for the first time. We propose to obtain pseudo domain label by leveraging a list of candidate domain labels in a text format and pretrained vision-language model. We conduct experiments to demonstrate the effectiveness of our methods over the latest methods and provide extensive ablation studies.

引用

页码：1167 / 1186

页数：19

共 50 条

[1] InstaFormer plus plus : Multi-Domain Instance-Aware Image-to-Image Translation with Transformer
Kim, Soohyun
Baek, Jongbeom
Park, Jihye
Ha, Eunjae
Jung, Homin
Lee, Taeyoung
Kim, Seungryong
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (04) : 1167 - 1186
[2] InstaFormer: Instance-Aware Image-to-Image Translation with Transformer
Kim, Soohyun
Baek, Jongbeom
Park, Jihye
Kim, Gyeongnyeon
Kim, Seungryong
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18300 - 18310
[3] RHN: RoI Restricted Hybrid Network for Instance-Aware Image-to-Image Translation
Liu, Yaqi
Wang, Hanhan
Zhang, Jianyi
Xiao, Song
Cai, Qiang
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 1156 - 1160
[4] DMDIT: Diverse multi-domain image-to-image translation
Shao, Mingwen
Zhang, Youcai
Liu, Huan
Wang, Chao
Li, Le
Shao, Xun
KNOWLEDGE-BASED SYSTEMS, 2021, 229
[5] Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation
Gomez, Raul
Liu, Yahui
De Nadai, Marco
Karatzas, Dimosthenis
Lepri, Bruno
Sebe, Nicu
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3164 - 3172
[6] Multi-Domain Image-to-Image Translation with Adaptive Inference Graph
The-Phuc Nguyen
Lathuiliere, Stephane
Ricci, Elisa
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5368 - 5375
[7] MULTI-DOMAIN UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION WITH APPEARANCE ADAPTIVE CONVOLUTION
Jeong, Somi
Lee, Jiyoung
Sohn, Kwanghoon
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1750 - 1754
[8] Cross-Granularity Learning for Multi-Domain Image-to-Image Translation
Fu, Huiyuan
Yu, Ting
Wang, Xin
Ma, Huadong
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3099 - 3107
[9] Multi-Domain Image-to-Image Translation via a Unified Circular Framework
Wang, Yuxi
Zhang, Zhaoxiang
Hao, Wangli
Song, Chunfeng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 670 - 684
[10] RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes
Wu, Po-Wei
Lin, Yu-Jing
Chang, Che-Han
Chang, Edward Y.
Liao, Shih-Wei
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5913 - 5921

← 1 2 3 4 5 →