Saliency-guided image translation

被引:0
作者
Jiang, Lai [1 ]
Dai, Ning [1 ]
Xu, Mai [1 ]
Deng, Xin [2 ]
Li, Shengxi [1 ]
机构
[1] School of Electronic and Engineering, Beihang University, Beijing
[2] School of Cyber Science and Technology, Beihang University, Beijing
来源
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics | 2023年 / 49卷 / 10期
基金
中国国家自然科学基金;
关键词
attention mechanism; dataset; generative adversarial network; image translation; saliency;
D O I
10.13700/j.bh.1001-5965.2021.0732
中图分类号
学科分类号
摘要
This paper proposes a novel task for saliency-guided image translation, with the goal of image-to-image translation conditioned on the user specified saliency map. To address this problem, we develop a novel generative adversarial network (GAN) method -based model, called SalG-GAN method. Given the original image and target saliency map, proposed method can generate a translated image that satisfies the target saliency map. In proposed method, a disentangled representation framework is proposed to encourage the model to learn diverse translations for the same target saliency condition. A saliency-based attention module is introduced as a special attention mechanism to facilitate the developed structures of saliency-guided generator, saliency cue encoder, and saliency-guided global and local discriminators. Furthermore, we build a synthetic dataset and a real-world dataset with labeled visual attention for training and evaluating proposed method. The experimental results on both datasets verify the effectiveness of our model for saliency-guided image translation. © 2023 Beijing University of Aeronautics and Astronautics (BUAA). All rights reserved.
引用
收藏
页码:2689 / 2698
页数:9
相关论文
共 37 条
  • [1] EL-NOUBY A, SHARMA S, SCHULZ H, Et al., Tell, draw, and repeat: Penerating and modifying images based on continual linguistic instruction, 2019 IEEE/CVF International Conference on Computer Vision, pp. 10303-10311, (2020)
  • [2] HONG S, YANG D D, CHOI J, Et al., Inferring semantic layout for hierarchical text-to-image synthesis, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7986-7994, (2018)
  • [3] ISOLA P, ZHU J Y, ZHOU T H, Et al., Image-to-image translation with conditional adversarial networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5967-5976, (2017)
  • [4] ZHAO B, MENG L L, YIN W D, Et al., Image generation from layout, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8576-8585, (2020)
  • [5] CHOI Y, CHOI M, KIM M, Et al., StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8789-8797, (2018)
  • [6] YIN W D, LIU Z W, CHANGE LOY C., Instance-level facial attributes transfer with geometry-aware flow, Proceedings of the AAAI Conference on Artificial Intelligence, 33, 1, pp. 9111-9118, (2019)
  • [7] JOHNSON J, GUPTA A, LI F F., Image generation from scene graphs, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1219-1228, (2018)
  • [8] KARRAS T, LAINE S, AILA T M., A style-based generator architecture for generative adversarial networks, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4396-4405, (2020)
  • [9] WANG T C, LIU M Y, ZHU J Y, Et al., High-resolution image synthesis and semantic manipulation with conditional GANs, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8798-8807, (2018)
  • [10] ZHU J Y, PARK T, ISOLA P, Et al., Unpaired image-to-image translation using cycle-consistent adversarial networks, 2017 IEEE International Conference on Computer Vision, pp. 2242-2251, (2017)