InstaFormer++: Multi-Domain Instance-Aware Image-to-Image Translation with Transformer

被引:0
|
作者
Soohyun Kim
Jongbeom Baek
Jihye Park
Eunjae Ha
Homin Jung
Taeyoung Lee
Seungryong Kim
机构
[1] Korea University,
[2] Hanwha Systems Co.,undefined
[3] Ltd,undefined
来源
关键词
Image-to-image translation; GANs; Instance-aware image-to-image translation; Vision and language;
D O I
暂无
中图分类号
学科分类号
摘要
We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an image as visual tokens, our model discovers global consensus of content features by considering context information through self-attention module of Transformers. By augmenting such tokens with an instance-level feature extracted from the content feature with respect to bounding box information, our framework is capable of learning an interaction between object instances and the global image, thus boosting the instance-awareness. We replace layer normalization (LayerNorm) in standard Transformers with adaptive instance normalization (AdaIN) to enable a multi-modal translation with style codes. In addition, to improve the instance-awareness and translation quality at object regions, we present an instance-level content contrastive loss defined between input and translated image. Although competitive performance can be attained by InstaFormer, it may face some limitations, i.e., limited scalability in handling multiple domains, and reliance on domain annotations. To overcome this, we propose InstaFormer++ as an extension of Instaformer, which enables multi-domain translation in instance-aware image translation for the first time. We propose to obtain pseudo domain label by leveraging a list of candidate domain labels in a text format and pretrained vision-language model. We conduct experiments to demonstrate the effectiveness of our methods over the latest methods and provide extensive ablation studies.
引用
收藏
页码:1167 / 1186
页数:19
相关论文
共 50 条
  • [21] Registration on DCE-MRI images via multi-domain image-to-image translation
    Cai, Naxin
    Chen, Houjin
    Li, Yanfeng
    Peng, Yahui
    Guo, Linqiang
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2023, 104
  • [22] Unsupervised Exemplar-Domain Aware Image-to-Image Translation
    Fu, Yuanbin
    Ma, Jiayi
    Guo, Xiaojie
    ENTROPY, 2021, 23 (05)
  • [23] SemiStarGAN: Semi-supervised Generative Adversarial Networks for Multi-domain Image-to-Image Translation
    Hsu, Shu-Yu
    Yang, Chih-Yuan
    Huang, Chi-Chia
    Hsu, Jane Yung-jen
    COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 338 - 353
  • [24] AMMUNIT: An Attention-Based Multimodal Multi-domain UNsupervised Image-to-Image Translation Framework
    Luo, Lei
    Hsu, William H.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 358 - 370
  • [25] Instance-Aware Hashing for Multi-Label Image Retrieval
    Lai, Hanjiang
    Yan, Pan
    Shu, Xiangbo
    Wei, Yunchao
    Yan, Shuicheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (06) : 2469 - 2479
  • [26] SoloGAN: Multi-domain Multimodal Unpaired Image-to-Image Translation via a Single Generative Adversarial Network
    Huang S.
    He C.
    Cheng R.
    IEEE Transactions on Artificial Intelligence, 2022, 3 (05): : 722 - 737
  • [27] A Domain Gap Aware Generative Adversarial Network for Multi-Domain Image Translation
    Xu, Wenju
    Wang, Guanghui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 72 - 84
  • [28] Towards Instance-level Image-to-Image Translation
    Shen, Zhiqiang
    Huang, Mingyang
    Shi, Jianping
    Xue, Xiangyang
    Huang, Thomas
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3678 - 3687
  • [29] Domain Adaptive Image-to-image Translation
    Chen, Ying-Cong
    Xu, Xiaogang
    Jia, Jiaya
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5273 - 5282
  • [30] Panoptic-aware Image-to-Image Translation
    Zhang, Liyun
    Ratsamee, Photchara
    Wang, Bowen
    Luo, Zhaojie
    Uranishi, Yuki
    Higashida, Manabu
    Takemura, Haruo
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 259 - 268