Saliency-guided image translation

被引：0

作者：

Jiang, Lai ^{[1
]}

Dai, Ning ^{[1
]}

Xu, Mai ^{[1
]}

Deng, Xin ^{[2
]}

Li, Shengxi ^{[1
]}

机构：

[1] School of Electronic and Engineering, Beihang University, Beijing

[2] School of Cyber Science and Technology, Beihang University, Beijing

来源：

Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics | 2023年 / 49卷 / 10期

基金：

中国国家自然科学基金;

关键词：

attention mechanism; dataset; generative adversarial network; image translation; saliency;

D O I：

10.13700/j.bh.1001-5965.2021.0732

中图分类号：

学科分类号：

摘要：

This paper proposes a novel task for saliency-guided image translation, with the goal of image-to-image translation conditioned on the user specified saliency map. To address this problem, we develop a novel generative adversarial network (GAN) method -based model, called SalG-GAN method. Given the original image and target saliency map, proposed method can generate a translated image that satisfies the target saliency map. In proposed method, a disentangled representation framework is proposed to encourage the model to learn diverse translations for the same target saliency condition. A saliency-based attention module is introduced as a special attention mechanism to facilitate the developed structures of saliency-guided generator, saliency cue encoder, and saliency-guided global and local discriminators. Furthermore, we build a synthetic dataset and a real-world dataset with labeled visual attention for training and evaluating proposed method. The experimental results on both datasets verify the effectiveness of our model for saliency-guided image translation. © 2023 Beijing University of Aeronautics and Astronautics (BUAA). All rights reserved.

引用

页码：2689 / 2698

页数：9

共 37 条

[1] EL-NOUBY A, SHARMA S, SCHULZ H, Et al., Tell, draw, and repeat: Penerating and modifying images based on continual linguistic instruction, 2019 IEEE/CVF International Conference on Computer Vision, pp. 10303-10311, (2020)
[2] HONG S, YANG D D, CHOI J, Et al., Inferring semantic layout for hierarchical text-to-image synthesis, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7986-7994, (2018)
[3] ISOLA P, ZHU J Y, ZHOU T H, Et al., Image-to-image translation with conditional adversarial networks, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5967-5976, (2017)
[4] ZHAO B, MENG L L, YIN W D, Et al., Image generation from layout, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8576-8585, (2020)
[5] CHOI Y, CHOI M, KIM M, Et al., StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8789-8797, (2018)
[6] YIN W D, LIU Z W, CHANGE LOY C., Instance-level facial attributes transfer with geometry-aware flow, Proceedings of the AAAI Conference on Artificial Intelligence, 33, 1, pp. 9111-9118, (2019)
[7] JOHNSON J, GUPTA A, LI F F., Image generation from scene graphs, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1219-1228, (2018)
[8] KARRAS T, LAINE S, AILA T M., A style-based generator architecture for generative adversarial networks, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4396-4405, (2020)
[9] WANG T C, LIU M Y, ZHU J Y, Et al., High-resolution image synthesis and semantic manipulation with conditional GANs, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8798-8807, (2018)
[10] ZHU J Y, PARK T, ISOLA P, Et al., Unpaired image-to-image translation using cycle-consistent adversarial networks, 2017 IEEE International Conference on Computer Vision, pp. 2242-2251, (2017)

← 1 2 3 4 →