EM-LAST: Effective Multidimensional Latent Space Transport for an Unpaired Image-to-Image Translation With an Energy-Based Model

被引:2
作者
Han, Giwoong [1 ]
Min, Jinhong [1 ]
Han, Sung Won [1 ]
机构
[1] Korea Univ, Sch Ind & Management Engn, Seoul 02841, South Korea
关键词
Task analysis; Aerospace electronics; Visualization; Licenses; Generative adversarial networks; Deep learning; Decoding; Energy-based model; image-to-image translation; Langevin dynamics; multidimensional latent space; vector-quantized variational autoencoder;
D O I
10.1109/ACCESS.2022.3189352
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For an unpaired image-to-image translation to work effectively, the latent space of each image domain must be well-designed. The codes of each style must be translated toward the target while preserving the parts corresponding to the source content. In general, most Variational Autoencoder (VAE)-based models use a one-dimensional latent space. However, to apply high dimensional methodologies such as vector quantization, controlling a multidimensional latent space is necessary. In this study, among the VAE-based models that use relatively complex multidimensional latent spaces, we apply an Energy-Based Model and Vector-Quantized VAE v2, with the latter as the main model. We show that among the latent spaces that represent each image domain, the importance of each feature at the top and bottom latent spaces must be interpreted differently for appropriate translation. Therefore, we argue that simply understanding the features of latent space composition well can show effective image translation results. We also present various analyses and visual outcomes of multidimensional latent space transport.
引用
收藏
页码:72839 / 72849
页数:11
相关论文
共 41 条
[1]  
Acerbi L, 2018, ADV NEUR IN, V31
[2]  
Akkaya IB, 2020, IEEE IMAGE PROC, P1591, DOI [10.1109/icip40778.2020.9191271, 10.1109/ICIP40778.2020.9191271]
[3]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[4]  
Binkowski M., 2018, INT C LEARNING REPRE
[5]  
Brock A., 2019, ICLR
[6]  
Cho J, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P8785
[7]   StarGAN v2: Diverse Image Synthesis for Multiple Domains [J].
Choi, Yunjey ;
Uh, Youngjung ;
Yoo, Jaejun ;
Ha, Jung-Woo .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8185-8194
[8]   StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation [J].
Choi, Yunjey ;
Choi, Minje ;
Kim, Munyoung ;
Ha, Jung-Woo ;
Kim, Sunghun ;
Choo, Jaegul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8789-8797
[9]   STOCHASTIC RELAXATION, GIBBS DISTRIBUTIONS, AND THE BAYESIAN RESTORATION OF IMAGES [J].
GEMAN, S ;
GEMAN, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1984, 6 (06) :721-741
[10]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672