InstaFormer++: Multi-Domain Instance-Aware Image-to-Image Translation with Transformer

被引:0
|
作者
Soohyun Kim
Jongbeom Baek
Jihye Park
Eunjae Ha
Homin Jung
Taeyoung Lee
Seungryong Kim
机构
[1] Korea University,
[2] Hanwha Systems Co.,undefined
[3] Ltd,undefined
来源
关键词
Image-to-image translation; GANs; Instance-aware image-to-image translation; Vision and language;
D O I
暂无
中图分类号
学科分类号
摘要
We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an image as visual tokens, our model discovers global consensus of content features by considering context information through self-attention module of Transformers. By augmenting such tokens with an instance-level feature extracted from the content feature with respect to bounding box information, our framework is capable of learning an interaction between object instances and the global image, thus boosting the instance-awareness. We replace layer normalization (LayerNorm) in standard Transformers with adaptive instance normalization (AdaIN) to enable a multi-modal translation with style codes. In addition, to improve the instance-awareness and translation quality at object regions, we present an instance-level content contrastive loss defined between input and translated image. Although competitive performance can be attained by InstaFormer, it may face some limitations, i.e., limited scalability in handling multiple domains, and reliance on domain annotations. To overcome this, we propose InstaFormer++ as an extension of Instaformer, which enables multi-domain translation in instance-aware image translation for the first time. We propose to obtain pseudo domain label by leveraging a list of candidate domain labels in a text format and pretrained vision-language model. We conduct experiments to demonstrate the effectiveness of our methods over the latest methods and provide extensive ablation studies.
引用
收藏
页码:1167 / 1186
页数:19
相关论文
共 50 条
  • [1] InstaFormer plus plus : Multi-Domain Instance-Aware Image-to-Image Translation with Transformer
    Kim, Soohyun
    Baek, Jongbeom
    Park, Jihye
    Ha, Eunjae
    Jung, Homin
    Lee, Taeyoung
    Kim, Seungryong
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (04) : 1167 - 1186
  • [2] InstaFormer: Instance-Aware Image-to-Image Translation with Transformer
    Kim, Soohyun
    Baek, Jongbeom
    Park, Jihye
    Kim, Gyeongnyeon
    Kim, Seungryong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18300 - 18310
  • [3] RHN: RoI Restricted Hybrid Network for Instance-Aware Image-to-Image Translation
    Liu, Yaqi
    Wang, Hanhan
    Zhang, Jianyi
    Xiao, Song
    Cai, Qiang
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 1156 - 1160
  • [4] DMDIT: Diverse multi-domain image-to-image translation
    Shao, Mingwen
    Zhang, Youcai
    Liu, Huan
    Wang, Chao
    Li, Le
    Shao, Xun
    KNOWLEDGE-BASED SYSTEMS, 2021, 229
  • [5] Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation
    Gomez, Raul
    Liu, Yahui
    De Nadai, Marco
    Karatzas, Dimosthenis
    Lepri, Bruno
    Sebe, Nicu
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3164 - 3172
  • [6] Multi-Domain Image-to-Image Translation with Adaptive Inference Graph
    The-Phuc Nguyen
    Lathuiliere, Stephane
    Ricci, Elisa
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5368 - 5375
  • [7] MULTI-DOMAIN UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION WITH APPEARANCE ADAPTIVE CONVOLUTION
    Jeong, Somi
    Lee, Jiyoung
    Sohn, Kwanghoon
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1750 - 1754
  • [8] Cross-Granularity Learning for Multi-Domain Image-to-Image Translation
    Fu, Huiyuan
    Yu, Ting
    Wang, Xin
    Ma, Huadong
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3099 - 3107
  • [9] Multi-Domain Image-to-Image Translation via a Unified Circular Framework
    Wang, Yuxi
    Zhang, Zhaoxiang
    Hao, Wangli
    Song, Chunfeng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 670 - 684
  • [10] RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes
    Wu, Po-Wei
    Lin, Yu-Jing
    Chang, Che-Han
    Chang, Edward Y.
    Liao, Shih-Wei
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5913 - 5921